Proceedings of the conference. Association for Computational Linguistics. Meeting最新文献

Medical Vision-Language Pre-Training for Brain Abnormalities. 针对大脑异常的医学视觉语言预培训。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2024-05-01

Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take brain abnormalities as an example to demonstrate how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset from case reports and published journals and subsequently constructing a high-performance vision-language model tailored to specific medical tasks. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain. We evaluated the resulting model with quantitative and qualitative intrinsic evaluations. The resulting dataset and our code can be found here https://github.com/masoud-monajati/MedVL_pretraining_pipeline.

对于需要理解视觉和语言元素的任务来说，视觉语言模型已变得越来越强大，在这些模态之间架起了一座桥梁。在多模态临床人工智能的背景下，对拥有特定领域知识的模型的需求日益增长，因为现有模型往往缺乏医疗应用所需的专业知识。在本文中，我们以大脑异常为例，演示如何从公共资源（如 PubMed）中自动收集医学图像-文本对齐数据进行预训练。特别是，我们提出了一个简化预训练过程的管道，首先从病例报告和已出版期刊中收集大量脑图像-文本数据集，然后构建一个为特定医疗任务量身定制的高性能视觉语言模型。我们还研究了医疗领域中将子图标映射到子标题的独特挑战。我们通过定量和定性的内在评估，对由此产生的模型进行了评估。由此产生的数据集和我们的代码可以在这里找到 https://github.com/masoud-monajati/MedVL_pretraining_pipeline。

{"title":"Medical Vision-Language Pre-Training for Brain Abnormalities.","authors":"Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang","doi":"","DOIUrl":"","url":null,"abstract":"Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take brain abnormalities as an example to demonstrate how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset from case reports and published journals and subsequently constructing a high-performance vision-language model tailored to specific medical tasks. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain. We evaluated the resulting model with quantitative and qualitative intrinsic evaluations. The resulting dataset and our code can be found here https://github.com/masoud-monajati/MedVL_pretraining_pipeline.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2024 LREC/COLING","pages":"11159-11164"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification.

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2024-03-01

Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang

Hierarchical text classification (HTC) is a complex subtask under multi-label text classification, characterized by a hierarchical label taxonomy and data imbalance. The best-performing models aim to learn a static representation by combining document and hierarchical label information. However, the relevance of document sections can vary based on the hierarchy level, necessitating a dynamic document representation. To address this, we propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations. We introduce a level-guided loss function to capture the relationship between text and label name semantics. Our approach incorporates a task-specific pretraining strategy, adapting the language model to in-domain knowledge and significantly enhancing performance for classes with limited examples. Furthermore, we present a new and valuable dataset called ENZYME, designed for HTC, which comprises articles from PubMed with the goal of predicting Enzyme Commission (EC) numbers. Through extensive experiments on the ENZYME dataset and the widely recognized WOS and NYT datasets, our methodology demonstrates superior performance, surpassing existing approaches while efficiently handling data and mitigating class imbalance. We release our code and dataset here: https://github.com/viditjain99/HiGen.

{"title":"HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification.","authors":"Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang","doi":"","DOIUrl":"","url":null,"abstract":"Hierarchical text classification (HTC) is a complex subtask under multi-label text classification, characterized by a hierarchical label taxonomy and data imbalance. The best-performing models aim to learn a static representation by combining document and hierarchical label information. However, the relevance of document sections can vary based on the hierarchy level, necessitating a dynamic document representation. To address this, we propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations. We introduce a level-guided loss function to capture the relationship between text and label name semantics. Our approach incorporates a task-specific pretraining strategy, adapting the language model to in-domain knowledge and significantly enhancing performance for classes with limited examples. Furthermore, we present a new and valuable dataset called ENZYME, designed for HTC, which comprises articles from PubMed with the goal of predicting Enzyme Commission (EC) numbers. Through extensive experiments on the ENZYME dataset and the widely recognized WOS and NYT datasets, our methodology demonstrates superior performance, surpassing existing approaches while efficiently handling data and mitigating class imbalance. We release our code and dataset here: https://github.com/viditjain99/HiGen.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2024 EACL","pages":"1354-1368"},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles. 利用BERT和大型语言模型集成提高临床笔记部分分类模型的可移植性。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01

Weipeng Zhou, Dmitriy Dligach, Majid Afshar, Yanjun Gao, Timothy A Miller

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

电子健康记录中的文本被组织成多个部分，将这些部分分类为多个部分类别对下游任务很有用。在这项工作中，我们试图通过将监督学习模型中的数据集特定知识与大型语言模型（LLM）中的世界知识相结合来提高部分分类模型的可转移性。令人惊讶的是，我们发现基于零样本LLM的监督BERT模型应用于域外数据。我们还发现，它们的优势是协同的，因此简单的集成技术可以带来额外的性能提升。

引用次数: 0

Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles 利用BERT和大型语言模型集成提高临床笔记部分分类模型的可移植性

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01 DOI: 10.18653/v1/2023.clinicalnlp-1.16

Weipeng Zhou, M. Afshar, Dmitriy Dligach, Yanjun Gao, Timothy Miller

Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.

电子健康记录中的文本被组织成部分，将这些部分分类为部分类别对于后续任务很有用。在这项工作中，我们试图通过将监督学习模型中的数据集特定知识与大型语言模型(llm)中的世界知识相结合来提高截面分类模型的可移植性。令人惊讶的是，我们发现零射击llm优于应用于域外数据的基于bert的监督模型。我们还发现它们的优势是协同的，因此一个简单的集成技术可以带来额外的性能收益。

引用次数: 0

Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses. 不太可能的头脑风暴:使用语言模型生成替代假设。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01 DOI: 10.18653/v1/2023.findings-acl.794

Liyan Tang, Yifan Peng, Yanshan Wang, Ying Ding, Greg Durrett, Justin F Rousseau

A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.

人类决策者从纠正他们偏见的人工智能助手中获益最多。对于某些问题，例如根据发现生成放射学报告的解释，如果这些结果对用户来说已经很明显，那么仅预测高度可能的结果的系统可能用处不大。为了减轻人类决策中的偏见，值得考虑广泛的鉴别诊断，而不是最可能的选择。我们引入了一个新任务，“不太可能的头脑风暴”，它要求一个模型生成人类认为相关但不太可能发生的输出。我们在两种设置中探索任务:脑MRI解释生成设置和日常常识推理设置。我们发现，以不太可能的假设作为目标的基线训练方法产生的输出，人类在近一半的时间内评估为可能或不相关;标准的MLE培训效果不佳。为了解决这个问题，我们提出了一种受控文本生成方法，该方法使用一种新的对比学习策略来鼓励模型区分根据人类生成的可能输出和不太可能输出。我们通过自动和人工评估将我们的方法与几种最先进的受控文本生成模型进行了比较，并表明我们的模型生成不太可能输出的能力得到了提高。

{"title":"Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses.","authors":"Liyan Tang, Yifan Peng, Yanshan Wang, Ying Ding, Greg Durrett, Justin F Rousseau","doi":"10.18653/v1/2023.findings-acl.794","DOIUrl":"https://doi.org/10.18653/v1/2023.findings-acl.794","url":null,"abstract":"A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, \"less likely brainstorming,\" that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2023 ","pages":"12532-12555"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10494958/pdf/nihms-1923571.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10263511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting Relation Extraction in the era of Large Language Models. 再论大语言模型时代的关系抽取。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01 DOI: 10.18653/v1/2023.acl-long.868

Somin Wadhwa, Silvio Amir, Byron C Wallace

Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a sequence-to-sequence task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.

关系抽取是自然语言处理的核心任务，主要是从文本中推断实体之间的语义关系。标准的监督正则技术需要训练模块来标记包含实体跨度的令牌，然后预测它们之间的关系。最近的工作将该问题视为序列到序列的任务，将实体之间的关系线性化，作为目标字符串根据输入条件生成。在这里，我们突破了这种方法的局限性，使用比之前工作中考虑的更大的语言模型(GPT-3和Flan-T5大)，并在不同级别的监督下评估它们在标准RE任务上的表现。我们通过进行人工评估来解决评估可再生能源生成方法所固有的问题，而不是依赖于精确匹配。在这种改进的评价下，我们发现:(1)GPT-3的少镜头提示达到了接近SOTA的性能，即大致相当于现有的完全监督模型;(2) Flan-T5在少数镜头设置中没有那么强的能力，但是用思维链(CoT)风格的解释(通过GPT-3生成)来监督和微调它可以产生SOTA结果。我们将此模型作为可再生能源任务的新基准发布。

{"title":"Revisiting Relation Extraction in the era of Large Language Models.","authors":"Somin Wadhwa, Silvio Amir, Byron C Wallace","doi":"10.18653/v1/2023.acl-long.868","DOIUrl":"https://doi.org/10.18653/v1/2023.acl-long.868","url":null,"abstract":"Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a sequence-to-sequence task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2023 ","pages":"15566-15589"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10482322/pdf/nihms-1912166.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10181357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes. 问题列表总结概述(ProbSum) 2023关于总结患者主动诊断和电子健康记录进度记录问题的共享任务。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01 DOI: 10.18653/v1/2023.bionlp-1.43

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M Churpek, Majid Afshar

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

BioNLP Workshop 2023于2023年1月启动了一个关于问题列表总结(ProbSum)的共享任务。这项共同任务的目的是吸引未来的研究努力，为现实世界的诊断决策支持应用建立NLP模型，其中系统生成相关和准确的诊断将增加医疗保健提供者的决策过程，提高对患者的护理质量。参与者的目标是开发模型，利用从危重病人住院时收集的日常护理笔记的输入，生成诊断和问题列表。8个团队向共享任务排行榜提交了他们的最终系统。在本文中，我们描述了任务、数据集、评估指标和基线系统。此外，还总结了各参赛队尝试的不同方法的技术和评估结果。

{"title":"Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes.","authors":"Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M Churpek, Majid Afshar","doi":"10.18653/v1/2023.bionlp-1.43","DOIUrl":"https://doi.org/10.18653/v1/2023.bionlp-1.43","url":null,"abstract":"The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2023 ","pages":"461-467"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10426335/pdf/nihms-1923203.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10017111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

End-to-end clinical temporal information extraction with multi-head attention. 使用多头注意力进行端到端临床时间信息提取。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01

Timothy Miller, Steven Bethard, Dmitriy Dligach, Guergana Savova

Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.

从电子健康记录中理解文本中的时间关系对于许多重要的下游临床应用来说是有价值的。自2017年Clinical TempEval以来，关于时间关系提取的端到端系统的工作很少，大多数工作都集中在给出黄金标准事件和时间表达式的环境上。在这项工作中，我们在预先训练的转换器编码器之上使用了一种新颖的多头注意力机制，以允许学习过程关注上下文嵌入的多个方面。我们的系统在THYME语料库上实现了最先进的结果，无论是在域内还是跨域环境中。

引用次数: 0

Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning. 基于领域内语言模型的诊断推理多任务训练。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01

Brihat Sharma, Yanjun Gao, Timothy Miller, Matthew M Churpek, Majid Afshar, Dmitriy Dligach

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

生成式人工智能(AI)是增强临床诊断决策支持和减少诊断错误的一个有前途的方向，诊断错误是导致医疗错误的主要原因。为了进一步发展临床人工智能系统，引入了诊断推理基准(DR.BENCH)作为一个全面的生成式人工智能框架，由代表临床推理关键组件的六个任务组成。我们对领域内和领域外的语言模型以及多任务和单任务训练进行了比较分析，重点是DR.BENCH中的问题总结任务(Gao et al.， 2023)。我们证明了一个多任务、临床训练的语言模型在很大程度上优于其一般领域的对应模型，建立了一个新的最先进的性能，ROUGE-L得分为28.55。这项研究强调了领域特定训练对优化临床诊断推理任务的价值。

引用次数: 0

End-to-end clinical temporal information extraction with multi-head attention 基于多头注意力的端到端临床时间信息提取

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2023-07-01 DOI: 10.18653/v1/2023.bionlp-1.28

Timothy Miller, S. Bethard, Dmitriy Dligach, G. Savova

Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.

理解电子健康记录文本中的时间关系对于许多重要的下游临床应用是有价值的。自2017年临床TempEval以来，关于端到端时间关系提取系统的工作很少，大多数工作都集中在给出金标准事件和时间表达式的设置上。在这项工作中，我们在预训练的变压器编码器上使用了一种新的多头注意机制，使学习过程能够关注上下文化嵌入的多个方面。在域内和跨域设置中，我们的系统在百里香语料库上实现了广泛的最先进的结果。

引用次数: 0