Workshop on Biomedical Natural Language Processing最新文献

英文中文

BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition 基于课程学习的生物医学语言模型的命名实体识别预训练

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-08-16 DOI: 10.18653/v1/2023.bionlp-1.31

Vera Pavlova, M. Makhlouf

Using language models (LMs) pre-trained in a self-supervised setting on large corpora and then fine-tuning for a downstream task has helped to deal with the problem of limited label data for supervised learning tasks such as Named Entity Recognition (NER). Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained using different methods and techniques that advance results on many BioNLP tasks, including NER. However, there is still a lack of a comprehensive comparison of pre-training approaches that would work more optimally in the biomedical domain. This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We compare existing methods with our proposed pre-training method of initializing weights for new tokens by distilling existing weights from the BERT model inside the context where the tokens were found. The method helps to speed up the pre-training stage and improve performance on NER. In addition, we compare how masking rate, corruption strategy, and masking strategies impact the performance of the biomedical LM. Finally, using the insights from our experiments, we introduce a new biomedical LM (BIOptimus), which is pre-trained using Curriculum Learning (CL) and contextualized weight distillation method. Our model sets new states of the art on several biomedical Named Entity Recognition (NER) tasks. We release our code and all pre-trained models.

在大型语料库的自监督设置中使用预训练的语言模型(LMs)，然后对下游任务进行微调，有助于处理有监督学习任务(如命名实体识别(NER))中标签数据有限的问题。最近在生物医学语言处理方面的研究已经提供了一些使用不同方法和技术进行预训练的生物医学LMs，这些方法和技术在许多BioNLP任务(包括NER)上取得了进步。然而，对于在生物医学领域更有效的预训练方法，仍然缺乏一个全面的比较。本文旨在探讨不同的预训练方法，如对生物医学LM进行从头预训练和持续预训练。我们将现有方法与我们提出的预训练方法进行比较，该方法通过在发现标记的上下文中从BERT模型中提取现有权重来初始化新标记的权重。该方法有助于加快预训练阶段，提高NER的性能。此外，我们比较了掩蔽率、腐败策略和掩蔽策略对生物医学LM性能的影响。最后，基于实验结果，我们引入了一种新的生物医学LM (BIOptimus)，该LM使用课程学习(CL)和情境化权重蒸馏方法进行预训练。我们的模型在几个生物医学命名实体识别(NER)任务上设置了新的技术状态。我们发布我们的代码和所有预先训练的模型。

{"title":"BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition","authors":"Vera Pavlova, M. Makhlouf","doi":"10.18653/v1/2023.bionlp-1.31","DOIUrl":"https://doi.org/10.18653/v1/2023.bionlp-1.31","url":null,"abstract":"Using language models (LMs) pre-trained in a self-supervised setting on large corpora and then fine-tuning for a downstream task has helped to deal with the problem of limited label data for supervised learning tasks such as Named Entity Recognition (NER). Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained using different methods and techniques that advance results on many BioNLP tasks, including NER. However, there is still a lack of a comprehensive comparison of pre-training approaches that would work more optimally in the biomedical domain. This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We compare existing methods with our proposed pre-training method of initializing weights for new tokens by distilling existing weights from the BERT model inside the context where the tokens were found. The method helps to speed up the pre-training stage and improve performance on NER. In addition, we compare how masking rate, corruption strategy, and masking strategies impact the performance of the biomedical LM. Finally, using the insights from our experiments, we introduce a new biomedical LM (BIOptimus), which is pre-trained using Curriculum Learning (CL) and contextualized weight distillation method. Our model sets new states of the art on several biomedical Named Entity Recognition (NER) tasks. We release our code and all pre-trained models.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114576888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction 跨域测量、单元和上下文提取的多源(预)训练

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-08-05 DOI: 10.18653/v1/2023.bionlp-1.1

Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto

We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.

我们提出了一种基于预训练语言模型的自动测量和上下文提取的跨域方法。我们构建了一个多源、多领域的语料库，并训练了一个端到端的抽取管道。然后，我们应用多源任务自适应预训练和微调来测试我们模型的跨域泛化能力。此外，我们概念化并应用特定于任务的错误分析，并为未来的工作提供见解。我们的结果表明，多源训练产生了最好的整体结果，而单源训练在各自的领域产生了最好的结果。虽然我们的设置在提取数量值和单位方面是成功的，但需要更多的研究来改进上下文实体的提取。我们将这项工作中使用的跨领域语料库在线提供。

引用次数: 0

Building a Corpus for Biomedical Relation Extraction of Species Mentions 构建生物医学物种提及关系提取语料库

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-14 DOI: 10.48550/arXiv.2306.08403

Oumaima El Khettari, Solen Quiniou, Samuel Chaffron

We present a manually annotated new corpus, Species-Species Interaction (SSI), for extracting meaningful binary relations between species, in biomedical texts, at sentence level, with a focus on the gut microbiota. The corpus leverages PubTator to annotate species in full-text articles after evaluating different NER species taggers. Our first results are promising for extracting relations between species using BERT and its biomedical variants.

我们提出了一个手动注释的新语料库，物种-物种相互作用(SSI)，用于在句子级别提取生物医学文本中物种之间有意义的二元关系，重点是肠道微生物群。在评估不同的NER物种标记器后，语料库利用PubTator对全文文章中的物种进行注释。我们的第一个结果有望利用BERT及其生物医学变体提取物种之间的关系。

引用次数: 0

Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers 好数据，大数据，还是没有数据?生物医学论文研究方向分类器开发的三种方法比较

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-07 DOI: 10.48550/arXiv.2306.04820

S. Chandrasekhar, Chieh-Yang Huang, Ting Huang

The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific literature is research aspect classification, which categorizes sentences in abstracts to Background, Purpose, Method, and Finding. In this study, we investigate the impact of different datasets on model performance for the crowd-annotated CODA-19 research aspect classification task. Specifically, we explore the potential benefits of using the large, automatically curated PubMed 200K RCT dataset and evaluate the effectiveness of large language models (LLMs), such as LLaMA, GPT-3, ChatGPT, and GPT-4. Our results indicate that using the PubMed 200K RCT dataset does not improve performance for the CODA-19 task. We also observe that while GPT-4 performs well, it does not outperform the SciBERT model fine-tuned on the CODA-19 dataset, emphasizing the importance of a dedicated and task-aligned datasets dataset for the target task.

科学出版物的快速增长，特别是在COVID-19大流行期间，强调需要工具来帮助研究人员有效地了解最新进展。研究方面分类是理解科学文献的一个重要部分，它将摘要中的句子分为背景、目的、方法和发现四个部分。在本研究中，我们研究了不同数据集对群体注释CODA-19研究方面分类任务模型性能的影响。具体来说，我们探索了使用大型自动管理的PubMed 200K RCT数据集的潜在好处，并评估了大型语言模型(llm)的有效性，如LLaMA、GPT-3、ChatGPT和GPT-4。我们的结果表明，使用PubMed 200K RCT数据集并没有提高CODA-19任务的性能。我们还观察到，虽然GPT-4表现良好，但它并没有超过在CODA-19数据集上微调的SciBERT模型，这强调了目标任务专用和任务对齐数据集数据集的重要性。

引用次数: 0

Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers ChatGPT在生物医学任务上的评估:与微调生成变压器的零射击比较

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-07 DOI: 10.48550/arXiv.2306.04504

Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, J. Huang

ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT’s pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.

ChatGPT是OpenAI开发的一个大型语言模型。尽管它在各种任务中的表现令人印象深刻，但之前还没有研究过它在生物医学领域的能力。为此，本文旨在评估ChatGPT在各种基准生物医学任务上的性能，如关系提取、文档分类、问题回答和摘要。据我们所知，这是第一个在生物医学领域对ChatGPT进行广泛评估的工作。有趣的是，根据我们的评估，我们发现在具有较小训练集的生物医学数据集中，零射击ChatGPT甚至优于最先进的微调生成变压器模型，如BioGPT和BioBART。这表明ChatGPT在大型文本语料库上的预训练使其即使在生物医学领域也相当专业化。我们的研究结果表明，ChatGPT有潜力成为缺乏大量注释数据的生物医学领域各种任务的有价值的工具。

引用次数: 5

shs-nlp at RadSum23: Domain-Adaptive Pre-training of Instruction-tuned LLMs for Radiology Report Impression Generation shs-nlp在RadSum23:面向放射学报告印象生成的指令调谐llm的领域自适应预训练

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.03264

Sanjeev Kumar Karn, Rikhiya Ghosh, P. Kusuma, Oladimeji Farri

Instruction-tuned generative large language models (LLMs), such as ChatGPT and Bloomz, possess excellent generalization abilities. However, they face limitations in understanding radiology reports, particularly when generating the IMPRESSIONS section from the FINDINGS section. These models tend to produce either verbose or incomplete IMPRESSIONS, mainly due to insufficient exposure to medical text data during training. We present a system that leverages large-scale medical text data for domain-adaptive pre-training of instruction-tuned LLMs, enhancing their medical knowledge and performance on specific medical tasks. We demonstrate that this system performs better in a zero-shot setting compared to several pretrain-and-finetune adaptation methods on the IMPRESSIONS generation task. Furthermore, it ranks 1st among participating systems in Task 1B: Radiology Report Summarization.

指令调优生成式大型语言模型(llm)，如ChatGPT和Bloomz，具有出色的泛化能力。然而，他们在理解放射学报告方面面临局限性，特别是在从FINDINGS部分生成IMPRESSIONS部分时。这些模型往往产生冗长或不完整的IMPRESSIONS，这主要是由于在训练期间没有充分接触医学文本数据。我们提出了一个系统，该系统利用大规模医学文本数据进行领域自适应预训练，以增强他们的医学知识和在特定医疗任务上的表现。我们证明，与几种预训练和微调自适应方法相比，该系统在零射击设置中表现更好。此外，在任务1B:放射学报告总结中，它在参与系统中排名第一。

引用次数: 4

Team:PULSAR at ProbSum 2023:PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models PULSAR:使用提取的医疗保健术语进行预训练，用于总结患者问题和使用黑盒大型语言模型进行数据增强

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02754

Hao Li, Yuping Wu, Viktor Schlegel, R. Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, G. Nenadic

Medical progress notes play a crucial role in documenting a patient’s hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient’s problems in the form of a “problem list” can aid stakeholders in understanding a patient’s condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focusses on generating a list of diagnoses and problems from the provider’s progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients’ problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.

医疗进展记录在记录病人的住院过程中起着至关重要的作用，包括他或她的病情、治疗计划以及医疗保健提供者的任何更新。以“问题清单”的形式自动总结病人的问题，可以帮助利益相关者了解病人的状况，减少工作量和认知偏见。BioNLP 2023共享任务1A侧重于根据供应商在住院期间的进度记录生成诊断和问题列表。在本文中，我们介绍了我们提出的方法来完成这项任务，它集成了两个互补的组件。一个组件使用大型语言模型(llm)进行数据扩展;另一种是抽象摘要法学硕士，具有新颖的预训练目标，用于生成摘要为列表的患者问题。我们的方法在所有提交的共享任务中排名第二。我们的模型在开发和测试数据集上的性能表明，我们的方法在未知数据上更加鲁棒，与相同大小的大型模型相比，改进了3.1个点。

{"title":"Team:PULSAR at ProbSum 2023:PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients’ Problems and Data Augmentation with Black-box Large Language Models","authors":"Hao Li, Yuping Wu, Viktor Schlegel, R. Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, G. Nenadic","doi":"10.48550/arXiv.2306.02754","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02754","url":null,"abstract":"Medical progress notes play a crucial role in documenting a patient’s hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient’s problems in the form of a “problem list” can aid stakeholders in understanding a patient’s condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focusses on generating a list of diagnoses and problems from the provider’s progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients’ problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131104235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge 临床术语自动词汇表:从本体论知识生成的大型生物医学定义词典

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-06-01 DOI: 10.48550/arXiv.2306.00665

François Remy, Thomas Demeester

Background: More than 400.000 biomedical concepts and some of their relationships are contained in SnomedCT, a comprehensive biomedical ontology. However, their concept names are not always readily interpretable by non-experts, or patients looking at their own electronic health records (EHR). Clear definitions or descriptions in understandable language or often not available. Therefore, generating human-readable definitions for biomedical concepts might help make the information they encode more accessible and understandable to a wider public.Objective: In this article, we introduce the Automatic Glossary of Clinical Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts generated using high-quality information extracted from the biomedical knowledge contained in SnomedCT.Methods: We generate a novel definition for every SnomedCT concept, after prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality verbalization of the SnomedCT relationships of the to-be-defined concept. A significant subset of the generated definitions was subsequently evaluated by NLP researchers with biomedical expertise on 5-point scales along the following three axes: factuality, insight, and fluency.Results: AGCT contains 422,070 computer-generated definitions for SnomedCT concepts, covering various domains such as diseases, procedures, drugs, and anatomy. The average length of the definitions is 49 words. The definitions were assigned average scores of over 4.5 out of 5 on all three axes, indicating a majority of factual, insightful, and fluent definitions.Conclusion: AGCT is a novel and valuable resource for biomedical tasks that require human-readable definitions for SnomedCT concepts. It can also serve as a base for developing robust biomedical retrieval models or other applications that leverage natural language understanding of biomedical knowledge.

背景:SnomedCT是一个综合性的生物医学本体，包含了40多万个生物医学概念和它们之间的一些关系。然而，它们的概念名称并不总是容易被非专家或查看自己的电子健康记录(EHR)的患者解释。用可理解的语言或通常没有的清晰的定义或描述。因此，为生物医学概念生成人类可读的定义可能有助于使它们编码的信息更容易被更广泛的公众获取和理解。目的:在本文中，我们介绍了临床术语自动词汇表(AGCT)，这是一个大型的临床概念生物医学词典，使用从SnomedCT中提取的生物医学知识中提取的高质量信息生成。方法:在使用OpenAI Turbo模型(GPT 3.5的一个变体)之后，我们使用待定义概念的SnomedCT关系的高质量语言化，为每个SnomedCT概念生成新的定义。生成的定义的一个重要子集随后由具有生物医学专业知识的NLP研究人员沿着以下三个轴在5分制上进行评估:事实性，洞察力和流畅性。结果:AGCT包含422,070个计算机生成的SnomedCT概念定义，涵盖各种领域，如疾病、程序、药物和解剖学。这些定义的平均长度为49个单词。这些定义的平均得分超过4.5分(满分5分)，这表明大多数定义是真实的、有见地的和流畅的。结论:对于需要人类可读的SnomedCT概念定义的生物医学任务，AGCT是一种新颖而有价值的资源。它还可以作为开发健壮的生物医学检索模型或利用生物医学知识的自然语言理解的其他应用程序的基础。

{"title":"Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge","authors":"François Remy, Thomas Demeester","doi":"10.48550/arXiv.2306.00665","DOIUrl":"https://doi.org/10.48550/arXiv.2306.00665","url":null,"abstract":"Background: More than 400.000 biomedical concepts and some of their relationships are contained in SnomedCT, a comprehensive biomedical ontology. However, their concept names are not always readily interpretable by non-experts, or patients looking at their own electronic health records (EHR). Clear definitions or descriptions in understandable language or often not available. Therefore, generating human-readable definitions for biomedical concepts might help make the information they encode more accessible and understandable to a wider public.Objective: In this article, we introduce the Automatic Glossary of Clinical Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts generated using high-quality information extracted from the biomedical knowledge contained in SnomedCT.Methods: We generate a novel definition for every SnomedCT concept, after prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality verbalization of the SnomedCT relationships of the to-be-defined concept. A significant subset of the generated definitions was subsequently evaluated by NLP researchers with biomedical expertise on 5-point scales along the following three axes: factuality, insight, and fluency.Results: AGCT contains 422,070 computer-generated definitions for SnomedCT concepts, covering various domains such as diseases, procedures, drugs, and anatomy. The average length of the definitions is 49 words. The definitions were assigned average scores of over 4.5 out of 5 on all three axes, indicating a majority of factual, insightful, and fluent definitions.Conclusion: AGCT is a novel and valuable resource for biomedical tasks that require human-readable definitions for SnomedCT concepts. It can also serve as a base for developing robust biomedical retrieval models or other applications that leverage natural language understanding of biomedical knowledge.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116537898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing and combining some popular NER approaches on Biomedical tasks 比较和结合一些流行的生物医学任务的NER方法

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-05-30 DOI: 10.48550/arXiv.2305.19120

Harsh Verma, S. Bergler, Narjes Tahaei

We compare three simple and popular approaches for NER: 1) SEQ (sequence labeling with a linear token classifier) 2) SeqCRF (sequence labeling with Conditional Random Fields), and 3) SpanPred (span prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), and SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets.Lastly, we implement a system that learns to combine SEQ’s and SpanPred’s predictions, generating systems that give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined.

我们比较了三种简单而流行的NER方法:1)SEQ(使用线性标记分类器的序列标记)2)SeqCRF(使用条件随机场的序列标记)和3)SpanPred(使用边界标记嵌入的跨度预测)。我们比较了4种生物医学NER任务的方法:GENIA、NCBI-Disease、LivingNER(西班牙语)和SocialDisNER(西班牙语)。SpanPred模型在LivingNER和SocialDisNER上展示了最先进的性能，分别将F1提高了1.3和0.6 F1。SeqCRF模型也在LivingNER和SocialDisNER上展示了最先进的性能，分别将F1提高了0.2 F1和0.7 F1。SEQ模型与最先进的LivingNER数据集具有竞争力。我们将探讨结合这三种方法的一些简单方法。我们发现多数投票在所有4个数据集上始终具有高精度和高F1。最后，我们实现了一个系统，该系统可以学习结合SEQ和SpanPred的预测，生成在所有4个数据集上具有高召回率和高F1的系统。在GENIA数据集上，我们发现我们的学习组合系统比被组合的系统显著提高了F1(+1.2)和召回率(+2.1)。

{"title":"Comparing and combining some popular NER approaches on Biomedical tasks","authors":"Harsh Verma, S. Bergler, Narjes Tahaei","doi":"10.48550/arXiv.2305.19120","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19120","url":null,"abstract":"We compare three simple and popular approaches for NER: 1) SEQ (sequence labeling with a linear token classifier) 2) SeqCRF (sequence labeling with Conditional Random Fields), and 3) SpanPred (span prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), and SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets.Lastly, we implement a system that learns to combine SEQ’s and SpanPred’s predictions, generating systems that give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133666308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Zero-shot Temporal Relation Extraction with ChatGPT 基于ChatGPT的零距时间关系提取

Workshop on Biomedical Natural Language Processing

Pub Date : 2023-04-11 DOI: 10.48550/arXiv.2304.05454

Chenhan Yuan, Qianqian Xie, S. Ananiadou

The goal of temporal relation extraction is to infer the temporal relation between two events in the document. Supervised models are dominant in this task. In this work, we investigate ChatGPT’s ability on zero-shot temporal relation extraction. We designed three different prompt techniques to break down the task and evaluate ChatGPT. Our experiments show that ChatGPT’s performance has a large gap with that of supervised methods and can heavily rely on the design of prompts. We further demonstrate that ChatGPT can infer more small relation classes correctly than supervised methods. The current shortcomings of ChatGPT on temporal relation extraction are also discussed in this paper. We found that ChatGPT cannot keep consistency during temporal inference and it fails in actively long-dependency temporal inference.

时间关系提取的目的是推断文档中两个事件之间的时间关系。监督模型在这项任务中占主导地位。在这项工作中，我们研究了ChatGPT在零射击时间关系提取方面的能力。我们设计了三种不同的提示技术来分解任务并评估ChatGPT。我们的实验表明，ChatGPT的性能与监督方法有很大的差距，并且严重依赖于提示符的设计。我们进一步证明，ChatGPT可以比监督方法正确地推断更多的小关系类。本文还讨论了ChatGPT在时态关系提取方面存在的不足。我们发现ChatGPT在时间推理过程中不能保持一致性，并且在主动长依赖的时间推理中失败。

引用次数: 20

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Biomedical Natural Language Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀