Developing a natural language processing system using transformer-based models for adverse drug event detection in electronic health records

medRxiv - Health Informatics Pub Date : 2024-07-10 DOI:10.1101/2024.07.09.24310100

Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi

{"title":"Developing a natural language processing system using transformer-based models for adverse drug event detection in electronic health records","authors":"Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi","doi":"10.1101/2024.07.09.24310100","DOIUrl":null,"url":null,"abstract":"Objective:\nTo develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs).\nMaterials and Methods:\nWe fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset of the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. We investigated two data processing methods, window-based and split-based approaches, to find an optimal processing method. We evaluated the generalization capabilities on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs.\nResults:\nOn the n2c2 dataset, the best average macro F-scores of 0.832 and 0.868 were achieved using a 15-word window with PubMedBERT and a 10-chunk split with Clinical-Longformer. On the VUMC dataset, the best average macro F-scores of 0.720 and 0.786 were achieved using a 4-chunk split with PubMedBERT and Clinical-Longformer.\nDiscussion:\nOur study provided a comparative analysis of data processing methods. The fine-tuned transformer models showed good performance for ADE-related tasks. Especially, Clinical-Longformer model with split-based approach had a great potential for practical implementation of ADE detection. While the token limit was crucial, the chunk size also significantly influenced model performance, even when the text length was within the token limit.\nConclusion:\nWe provided guidance on model development, including data processing methods for ADE detection from clinical notes using transformer-based models. Our results on two datasets indicated that data processing methods and models should be carefully selected based on the type of clinical notes and the allocation trade-offs of human and computational power in annotation and model fine-tuning.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.09.24310100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs). Materials and Methods: We fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset of the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. We investigated two data processing methods, window-based and split-based approaches, to find an optimal processing method. We evaluated the generalization capabilities on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs. Results: On the n2c2 dataset, the best average macro F-scores of 0.832 and 0.868 were achieved using a 15-word window with PubMedBERT and a 10-chunk split with Clinical-Longformer. On the VUMC dataset, the best average macro F-scores of 0.720 and 0.786 were achieved using a 4-chunk split with PubMedBERT and Clinical-Longformer. Discussion: Our study provided a comparative analysis of data processing methods. The fine-tuned transformer models showed good performance for ADE-related tasks. Especially, Clinical-Longformer model with split-based approach had a great potential for practical implementation of ADE detection. While the token limit was crucial, the chunk size also significantly influenced model performance, even when the text length was within the token limit. Conclusion: We provided guidance on model development, including data processing methods for ADE detection from clinical notes using transformer-based models. Our results on two datasets indicated that data processing methods and models should be carefully selected based on the type of clinical notes and the allocation trade-offs of human and computational power in annotation and model fine-tuning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用基于转换器的模型开发自然语言处理系统，用于检测电子健康记录中的药物不良事件

目的：开发一种基于转换器的自然语言处理（NLP）系统，用于从电子健康记录（EHR）中的临床笔记中检测药物不良事件（ADE）。材料与方法：我们使用2018年全国NLP临床挑战赛（n2c2）共享任务轨道2的处理数据集，对BERT Short-Formers和Clinical-Longformer进行了微调。我们研究了两种数据处理方法，即基于窗口的方法和基于分割的方法，以找到最佳的处理方法。结果表明：在 n2c2 数据集上，使用 PubMedBERT 的 15 字窗口和 Clinical-Longformer 的 10 块分割，分别获得了 0.832 和 0.868 的最佳平均宏 F 分数。讨论：我们的研究对数据处理方法进行了比较分析。微调转换器模型在 ADE 相关任务中表现良好。尤其是基于拆分方法的 Clinical-Longformer 模型在 ADE 检测的实际应用中潜力巨大。结论：我们为模型开发提供了指导，包括使用基于转换器的模型从临床笔记中检测 ADE 的数据处理方法。我们对两个数据集的研究结果表明，应根据临床笔记的类型以及注释和模型微调过程中人力和计算力的分配权衡，谨慎选择数据处理方法和模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量