首页 > 最新文献

Workshop on Biomedical Natural Language Processing最新文献

英文 中文
Stress Test Evaluation of Biomedical Word Embeddings 生物医学词嵌入的压力测试评价
Pub Date : 2021-07-24 DOI: 10.18653/v1/2021.bionlp-1.13
Vladimir Araujo, Andrés Carvallo, C. Aspillaga, C. Thorne, Denis Parra
The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe “stress” scenarios. In this work, we systematically evaluate three language models with adversarial examples – automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios focused on the biomedical named entity recognition (NER) task, one inspired by spelling errors and another based on the use of synonyms for medical terms. Our experiments with three benchmarks show that the performance of the original models decreases considerably, in addition to revealing their weaknesses and strengths. Finally, we show that adversarial training causes the models to improve their robustness and even to exceed the original performance in some cases.
预训练词嵌入的成功推动了它们在生物医学领域的应用,情境化嵌入在几个生物医学NLP任务中产生了显著的结果。然而,缺乏量化他们在严重“压力”情景下的行为的研究。在这项工作中,我们使用对抗性示例系统地评估了三种语言模型-自动构建的测试,使我们能够检查模型的鲁棒性。我们提出了两种针对生物医学命名实体识别(NER)任务的压力场景,一种是由拼写错误引起的,另一种是基于医学术语同义词的使用。我们对三个基准的实验表明,除了揭示了它们的弱点和优势之外,原始模型的性能也显著下降。最后,我们证明了对抗训练使模型提高了鲁棒性,在某些情况下甚至超过了原来的性能。
{"title":"Stress Test Evaluation of Biomedical Word Embeddings","authors":"Vladimir Araujo, Andrés Carvallo, C. Aspillaga, C. Thorne, Denis Parra","doi":"10.18653/v1/2021.bionlp-1.13","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.13","url":null,"abstract":"The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe “stress” scenarios. In this work, we systematically evaluate three language models with adversarial examples – automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios focused on the biomedical named entity recognition (NER) task, one inspired by spelling errors and another based on the use of synonyms for medical terms. Our experiments with three benchmarks show that the performance of the original models decreases considerably, in addition to revealing their weaknesses and strengths. Finally, we show that adversarial training causes the models to improve their robustness and even to exceed the original performance in some cases.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Measuring the relative importance of full text sections for information retrieval from scientific literature. 测量从科学文献中检索信息的全文部分的相对重要性。
Pub Date : 2021-06-01 DOI: 10.18653/v1/2021.bionlp-1.27
Lana Yeganova, Won Kim, Donald C. Comeau, W. Wilbur, Zhiyong Lu
With the growing availability of full-text articles, integrating abstracts and full texts of documents into a unified representation is essential for comprehensive search of scientific literature. However, previous studies have shown that naïvely merging abstracts with full texts of articles does not consistently yield better performance. Balancing the contribution of query terms appearing in the abstract and in sections of different importance in full text articles remains a challenge both with traditional bag-of-words IR approaches and for neural retrieval methods. In this work we establish the connection between the BM25 score of a query term appearing in a section of a full text document and the probability of that document being clicked or identified as relevant. Probability is computed using Pool Adjacent Violators (PAV), an isotonic regression algorithm, providing a maximum likelihood estimate based on the observed data. Using this probabilistic transformation of BM25 scores we show an improved performance on the PubMed Click dataset developed and presented in this study, as well as the 2007 TREC Genomics collection.
随着全文文章的日益增多,将文献摘要和全文整合为一个统一的表示形式对于科学文献的全面检索是必不可少的。然而,先前的研究表明naïvely将摘要与文章全文合并并不能始终产生更好的性能。对于传统的词袋IR方法和神经检索方法来说,平衡出现在摘要和全文文章中不同重要部分的查询词的贡献仍然是一个挑战。在这项工作中,我们建立了全文文档中出现的查询词的BM25分数与该文档被点击或识别为相关的概率之间的联系。概率计算使用池相邻违规者(PAV),一种等渗回归算法,提供基于观测数据的最大似然估计。使用BM25分数的这种概率转换,我们在PubMed Click数据集上展示了改进的性能,该数据集是在本研究中开发和呈现的,以及2007年TREC Genomics集合。
{"title":"Measuring the relative importance of full text sections for information retrieval from scientific literature.","authors":"Lana Yeganova, Won Kim, Donald C. Comeau, W. Wilbur, Zhiyong Lu","doi":"10.18653/v1/2021.bionlp-1.27","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.27","url":null,"abstract":"With the growing availability of full-text articles, integrating abstracts and full texts of documents into a unified representation is essential for comprehensive search of scientific literature. However, previous studies have shown that naïvely merging abstracts with full texts of articles does not consistently yield better performance. Balancing the contribution of query terms appearing in the abstract and in sections of different importance in full text articles remains a challenge both with traditional bag-of-words IR approaches and for neural retrieval methods. In this work we establish the connection between the BM25 score of a query term appearing in a section of a full text document and the probability of that document being clicked or identified as relevant. Probability is computed using Pool Adjacent Violators (PAV), an isotonic regression algorithm, providing a maximum likelihood estimate based on the observed data. Using this probabilistic transformation of BM25 scores we show an improved performance on the PubMed Click dataset developed and presented in this study, as well as the 2007 TREC Genomics collection.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130571536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts 纸质文档与其电子全文对应物的字级对齐
Pub Date : 2021-04-30 DOI: 10.18653/v1/2021.bionlp-1.19
Christoph Müller, Sucheta Ghosh, Ulrike Wittig, Maja Rey
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.
我们描述了一个简单的过程,用于在打印文档及其各自的全文版本之间自动创建单词级对齐。该过程是无监督的,仅使用标准的现成组件,在基本设置中达到85.01的f分,在使用预处理和后处理时达到86.63。潜在的应用领域是手动数据库管理(包括文档分类)和生物医学表达OCR。
{"title":"Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts","authors":"Christoph Müller, Sucheta Ghosh, Ulrike Wittig, Maja Rey","doi":"10.18653/v1/2021.bionlp-1.19","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.19","url":null,"abstract":"We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127084192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction 利用对比学习改进BERT模型进行生物医学关系提取
Pub Date : 2021-04-28 DOI: 10.18653/v1/2021.bionlp-1.1
P. Su, Yifan Peng, K. Vijay-Shanker
Contrastive learning has been used to learn a high-quality representation of the image in computer vision. However, contrastive learning is not widely utilized in natural language processing due to the lack of a general method of data augmentation for text data. In this work, we explore the method of employing contrastive learning to improve the text representation from the BERT model for relation extraction. The key knob of our framework is a unique contrastive pre-training step tailored for the relation extraction tasks by seamlessly integrating linguistic knowledge into the data augmentation. Furthermore, we investigate how large-scale data constructed from the external knowledge bases can enhance the generality of contrastive pre-training of BERT. The experimental results on three relation extraction benchmark datasets demonstrate that our method can improve the BERT model representation and achieve state-of-the-art performance. In addition, we explore the interpretability of models by showing that BERT with contrastive pre-training relies more on rationales for prediction. Our code and data are publicly available at: https://github.com/AnonymousForNow.
在计算机视觉中,对比学习已被用于学习图像的高质量表示。然而,由于缺乏一种通用的文本数据增强方法,对比学习在自然语言处理中的应用并不广泛。在这项工作中,我们探索了采用对比学习的方法来改进BERT模型中用于关系提取的文本表示。我们的框架的关键是一个独特的对比预训练步骤,通过无缝地将语言知识集成到数据增强中,为关系提取任务量身定制。此外,我们还研究了从外部知识库构建的大规模数据如何提高BERT对比预训练的通用性。在三个关系提取基准数据集上的实验结果表明,我们的方法可以改善BERT模型的表示,达到最先进的性能。此外,我们通过对比预训练的BERT更依赖于预测的基本原理来探索模型的可解释性。我们的代码和数据可以在https://github.com/AnonymousForNow上公开获取。
{"title":"Improving BERT Model Using Contrastive Learning for Biomedical Relation Extraction","authors":"P. Su, Yifan Peng, K. Vijay-Shanker","doi":"10.18653/v1/2021.bionlp-1.1","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.1","url":null,"abstract":"Contrastive learning has been used to learn a high-quality representation of the image in computer vision. However, contrastive learning is not widely utilized in natural language processing due to the lack of a general method of data augmentation for text data. In this work, we explore the method of employing contrastive learning to improve the text representation from the BERT model for relation extraction. The key knob of our framework is a unique contrastive pre-training step tailored for the relation extraction tasks by seamlessly integrating linguistic knowledge into the data augmentation. Furthermore, we investigate how large-scale data constructed from the external knowledge bases can enhance the generality of contrastive pre-training of BERT. The experimental results on three relation extraction benchmark datasets demonstrate that our method can improve the BERT model representation and achieve state-of-the-art performance. In addition, we explore the interpretability of models by showing that BERT with contrastive pre-training relies more on rationales for prediction. Our code and data are publicly available at: https://github.com/AnonymousForNow.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133745703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Claim Detection in Biomedical Twitter Posts 生物医学推特帖子中的索赔检测
Pub Date : 2021-04-23 DOI: 10.18653/v1/2021.bionlp-1.15
Amelie Wuhrl, Roman Klinger
Social media contains unfiltered and unique information, which is potentially of great value, but, in the case of misinformation, can also do great harm. With regards to biomedical topics, false information can be particularly dangerous. Methods of automatic fact-checking and fake news detection address this problem, but have not been applied to the biomedical domain in social media yet. We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims (the latter also with span annotations for the claim phrase). With this corpus, which we sample to be related to COVID-19, measles, cystic fibrosis, and depression, we develop baseline models which detect tweets that contain a claim automatically. Our analyses reveal that biomedical tweets are densely populated with claims (45 % in a corpus sampled to contain 1200 tweets focused on the domains mentioned above). Baseline classification experiments with embedding-based classifiers and BERT-based transfer learning demonstrate that the detection is challenging, however, shows acceptable performance for the identification of explicit expressions of claims. Implicit claim tweets are more challenging to detect.
社交媒体包含未经过滤和独特的信息,这些信息可能具有巨大的价值,但在错误信息的情况下,也可能造成巨大的伤害。就生物医学主题而言,虚假信息可能特别危险。自动事实核查和假新闻检测的方法解决了这个问题,但尚未应用于社交媒体的生物医学领域。我们的目标是填补这一研究空白,并为1200条推文的隐式和显式生物医学声明(后者也带有声明短语的跨度注释)进行注释。有了这个语料库,我们对其进行采样,以与COVID-19、麻疹、囊性纤维化和抑郁症相关,我们开发了基线模型,可以自动检测包含索赔的推文。我们的分析显示,生物医学推文密集地充斥着索赔要求(在一个包含1200条推文的语料库中,45%的推文集中在上述领域)。基于嵌入的分类器和基于bert的迁移学习的基线分类实验表明,检测是具有挑战性的,然而,在识别声明的显式表达方面表现出可接受的性能。隐性索赔推文更难检测。
{"title":"Claim Detection in Biomedical Twitter Posts","authors":"Amelie Wuhrl, Roman Klinger","doi":"10.18653/v1/2021.bionlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.15","url":null,"abstract":"Social media contains unfiltered and unique information, which is potentially of great value, but, in the case of misinformation, can also do great harm. With regards to biomedical topics, false information can be particularly dangerous. Methods of automatic fact-checking and fake news detection address this problem, but have not been applied to the biomedical domain in social media yet. We aim to fill this research gap and annotate a corpus of 1200 tweets for implicit and explicit biomedical claims (the latter also with span annotations for the claim phrase). With this corpus, which we sample to be related to COVID-19, measles, cystic fibrosis, and depression, we develop baseline models which detect tweets that contain a claim automatically. Our analyses reveal that biomedical tweets are densely populated with claims (45 % in a corpus sampled to contain 1200 tweets focused on the domains mentioned above). Baseline classification experiments with embedding-based classifiers and BERT-based transfer learning demonstrate that the detection is challenging, however, shows acceptable performance for the identification of explicit expressions of claims. Implicit claim tweets are more challenging to detect.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133826230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Improving Biomedical Pretrained Language Models with Knowledge 利用知识改进生物医学预训练语言模型
Pub Date : 2021-04-21 DOI: 10.18653/v1/2021.bionlp-1.20
Zheng Yuan, Yijia Liu, Chuanqi Tan, Songfang Huang, Fei Huang
Pretrained language models have shown success in many natural language processing tasks. Many works explore to incorporate the knowledge into the language models. In the biomedical domain, experts have taken decades of effort on building large-scale knowledge bases. For example, UMLS contains millions of entities with their synonyms and defines hundreds of relations among entities. Leveraging this knowledge can benefit a variety of downstream tasks such as named entity recognition and relation extraction. To this end, we propose KeBioLM, a biomedical pretrained language model that explicitly leverages knowledge from the UMLS knowledge bases. Specifically, we extract entities from PubMed abstracts and link them to UMLS. We then train a knowledge-aware language model that firstly applies a text-only encoding layer to learn entity representation and then applies a text-entity fusion encoding to aggregate entity representation. In addition, we add two training objectives as entity detection and entity linking. Experiments on the named entity recognition and relation extraction tasks from the BLURB benchmark demonstrate the effectiveness of our approach. Further analysis on a collected probing dataset shows that our model has better ability to model medical knowledge.
预训练语言模型在许多自然语言处理任务中显示出成功。许多研究都在探索如何将这些知识整合到语言模型中。在生物医学领域,专家们花了几十年的时间来建立大规模的知识库。例如,UMLS包含数百万个具有同义词的实体,并定义了实体之间的数百个关系。利用这些知识可以使各种下游任务受益,例如命名实体识别和关系提取。为此,我们提出了KeBioLM,这是一种明确利用UMLS知识库知识的生物医学预训练语言模型。具体来说,我们从PubMed摘要中提取实体并将它们链接到UMLS。然后,我们训练了一个知识感知语言模型,该模型首先应用纯文本编码层来学习实体表示,然后应用文本-实体融合编码来聚合实体表示。此外,我们还增加了实体检测和实体链接两个训练目标。基于BLURB基准的命名实体识别和关系提取实验验证了该方法的有效性。对收集到的探测数据集的进一步分析表明,我们的模型具有更好的医学知识建模能力。
{"title":"Improving Biomedical Pretrained Language Models with Knowledge","authors":"Zheng Yuan, Yijia Liu, Chuanqi Tan, Songfang Huang, Fei Huang","doi":"10.18653/v1/2021.bionlp-1.20","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.20","url":null,"abstract":"Pretrained language models have shown success in many natural language processing tasks. Many works explore to incorporate the knowledge into the language models. In the biomedical domain, experts have taken decades of effort on building large-scale knowledge bases. For example, UMLS contains millions of entities with their synonyms and defines hundreds of relations among entities. Leveraging this knowledge can benefit a variety of downstream tasks such as named entity recognition and relation extraction. To this end, we propose KeBioLM, a biomedical pretrained language model that explicitly leverages knowledge from the UMLS knowledge bases. Specifically, we extract entities from PubMed abstracts and link them to UMLS. We then train a knowledge-aware language model that firstly applies a text-only encoding layer to learn entity representation and then applies a text-entity fusion encoding to aggregate entity representation. In addition, we add two training objectives as entity detection and entity linking. Experiments on the named entity recognition and relation extraction tasks from the BLURB benchmark demonstrate the effectiveness of our approach. Further analysis on a collected probing dataset shows that our model has better ability to model medical knowledge.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Towards BERT-based Automatic ICD Coding: Limitations and Opportunities 基于bert的自动ICD编码:限制与机遇
Pub Date : 2021-04-14 DOI: 10.18653/v1/2021.bionlp-1.6
Damian Pascual, Sandro Luck, Roger Wattenhofer
Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance using relatively small portions of text. We point at better methods to aggregate information from long texts as the main need for improving BERT-based ICD coding.
国际疾病分类(ICD)自动编码是将国际疾病分类(ICD)的代码分配到医疗记录的任务。这些代码描述病人的状态,并有多种应用,例如,计算机辅助诊断或流行病学研究。由于医疗记录的复杂性和长度,ICD编码是一项具有挑战性的任务。与语言处理的一般趋势不同,没有任何转换器模型在此任务中达到高性能。在这里,我们使用PubMedBERT详细研究ICD编码,PubMedBERT是一种用于生物医学语言理解的最先进的转换模型。我们发现,在长文本上对模型进行微调的困难是基于bert的ICD编码模型的主要限制。我们进行了大量的实验,并表明尽管与当前最先进的技术存在差距,但预训练的变形器可以使用相对较小的文本部分达到具有竞争力的性能。我们指出,改进基于bert的ICD编码需要更好的方法来从长文本中收集信息。
{"title":"Towards BERT-based Automatic ICD Coding: Limitations and Opportunities","authors":"Damian Pascual, Sandro Luck, Roger Wattenhofer","doi":"10.18653/v1/2021.bionlp-1.6","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.6","url":null,"abstract":"Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance using relatively small portions of text. We point at better methods to aggregate information from long texts as the main need for improving BERT-based ICD coding.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121297301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes 兽医临床记录疾病证候分类的领域适应与实例选择
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bionlp-1.17
Brian Hur, Timothy Baldwin, Karin M. Verspoor, L. Hardefeldt, J. Gilkerson
Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns. This informs antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals in which veterinarians have an important role to play. We propose a document classification approach to determine the reason for administration of a given drug, with particular focus on domain adaptation from one drug to another, and instance selection to minimize annotation effort.
在兽医记录中确定抗生素使用的原因是了解抗菌药物使用模式的关键组成部分。这为抗菌素管理规划提供了信息,旨在对抗抗菌素耐药性,这是影响人类和动物的主要健康危机,兽医在其中发挥着重要作用。我们提出了一种文档分类方法来确定给定药物给药的原因,特别关注从一种药物到另一种药物的领域适应,以及实例选择以最大限度地减少注释工作。
{"title":"Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes","authors":"Brian Hur, Timothy Baldwin, Karin M. Verspoor, L. Hardefeldt, J. Gilkerson","doi":"10.18653/v1/2020.bionlp-1.17","DOIUrl":"https://doi.org/10.18653/v1/2020.bionlp-1.17","url":null,"abstract":"Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns. This informs antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals in which veterinarians have an important role to play. We propose a document classification approach to determine the reason for administration of a given drug, with particular focus on domain adaptation from one drug to another, and instance selection to minimize annotation effort.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130705879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
DeSpin: a prototype system for detecting spin in biomedical publications DeSpin:用于检测生物医学出版物中自旋的原型系统
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bionlp-1.5
A. Koroleva, S. Kamath, P. Bossuyt, P. Paroubek
Improving the quality of medical research reporting is crucial to reduce avoidable waste in research and to improve the quality of health care. Despite various initiatives aiming at improving research reporting – guidelines, checklists, authoring aids, peer review procedures, etc. – overinterpretation of research results, also known as spin, is still a serious issue in research reporting. In this paper, we propose a Natural Language Processing (NLP) system for detecting several types of spin in biomedical articles reporting randomized controlled trials (RCTs). We use a combination of rule-based and machine learning approaches to extract important information on trial design and to detect potential spin. The proposed spin detection system includes algorithms for text structure analysis, sentence classification, entity and relation extraction, semantic similarity assessment. Our algorithms achieved operational performance for the these tasks, F-measure ranging from 79,42 to 97.86% for different tasks. The most difficult task is extracting reported outcomes. Our tool is intended to be used as a semi-automated aid tool for assisting both authors and peer reviewers to detect potential spin. The tool incorporates a simple interface that allows to run the algorithms and visualize their output. It can also be used for manual annotation and correction of the errors in the outputs. The proposed tool is the first tool for spin detection. The tool and the annotated dataset are freely available.
提高医学研究报告的质量对于减少可避免的研究浪费和提高保健质量至关重要。尽管有各种旨在改进研究报告的倡议——指南、核对表、写作辅助工具、同行评议程序等——但是对研究结果的过度解释,也就是所谓的spin,仍然是研究报告中的一个严重问题。在本文中,我们提出了一种自然语言处理(NLP)系统,用于检测随机对照试验(rct)生物医学文章中的几种自旋。我们结合使用基于规则和机器学习的方法来提取有关试验设计的重要信息并检测潜在的自旋。本文提出的旋转检测系统包括文本结构分析、句子分类、实体和关系提取、语义相似度评估等算法。我们的算法实现了这些任务的运行性能,不同任务的F-measure范围从79,42到97.86%。最困难的任务是提取报告的结果。我们的工具旨在作为一个半自动化的辅助工具来帮助作者和同行审稿人检测潜在的旋转。该工具包含一个简单的界面,允许运行算法并可视化其输出。它还可以用于手动注释和纠正输出中的错误。该工具是第一个用于自旋检测的工具。该工具和带注释的数据集是免费提供的。
{"title":"DeSpin: a prototype system for detecting spin in biomedical publications","authors":"A. Koroleva, S. Kamath, P. Bossuyt, P. Paroubek","doi":"10.18653/v1/2020.bionlp-1.5","DOIUrl":"https://doi.org/10.18653/v1/2020.bionlp-1.5","url":null,"abstract":"Improving the quality of medical research reporting is crucial to reduce avoidable waste in research and to improve the quality of health care. Despite various initiatives aiming at improving research reporting – guidelines, checklists, authoring aids, peer review procedures, etc. – overinterpretation of research results, also known as spin, is still a serious issue in research reporting. In this paper, we propose a Natural Language Processing (NLP) system for detecting several types of spin in biomedical articles reporting randomized controlled trials (RCTs). We use a combination of rule-based and machine learning approaches to extract important information on trial design and to detect potential spin. The proposed spin detection system includes algorithms for text structure analysis, sentence classification, entity and relation extraction, semantic similarity assessment. Our algorithms achieved operational performance for the these tasks, F-measure ranging from 79,42 to 97.86% for different tasks. The most difficult task is extracting reported outcomes. Our tool is intended to be used as a semi-automated aid tool for assisting both authors and peer reviewers to detect potential spin. The tool incorporates a simple interface that allows to run the algorithms and visualize their output. It can also be used for manual annotation and correction of the errors in the outputs. The proposed tool is the first tool for spin detection. The tool and the annotated dataset are freely available.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121985433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity 评估模型配置和数据增强对临床语义文本相似度的效用
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bionlp-1.11
Yuxia Wang, Fei Liu, Karin M. Verspoor, Timothy Baldwin
In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely different fine-tuning and pooling strategies. We observe that the impact of domain-specific fine-tuning on clinical STS is much less than that in the general domain, likely due to the concept richness of the domain. Based on this, we propose two data augmentation techniques. Experimental results on N2C2-STS 1 demonstrate substantial improvements, validating the utility of the proposed methods.
在本文中,我们将预训练的语言模型应用于语义文本相似性(STS)任务,并特别关注临床领域。在临床STS资源匮乏的情况下,这些大型模型往往不切实际,容易过拟合。在BERT的基础上,我们研究了许多模型设计选择的影响,即不同的微调和池化策略。我们观察到,领域特定微调对临床STS的影响远小于一般领域,可能是由于该领域的概念丰富。基于此,我们提出了两种数据增强技术。在N2C2-STS - 1上的实验结果显示了实质性的改进,验证了所提出方法的实用性。
{"title":"Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity","authors":"Yuxia Wang, Fei Liu, Karin M. Verspoor, Timothy Baldwin","doi":"10.18653/v1/2020.bionlp-1.11","DOIUrl":"https://doi.org/10.18653/v1/2020.bionlp-1.11","url":null,"abstract":"In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely different fine-tuning and pooling strategies. We observe that the impact of domain-specific fine-tuning on clinical STS is much less than that in the general domain, likely due to the concept richness of the domain. Based on this, we propose two data augmentation techniques. Experimental results on N2C2-STS 1 demonstrate substantial improvements, validating the utility of the proposed methods.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126073143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
Workshop on Biomedical Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1