Annotation of epilepsy clinic letters for natural language processing

IF 1.6 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Biomedical Semantics Pub Date : 2024-09-15 DOI:10.1186/s13326-024-00316-z
Beata Fonferko-Shadrach, Huw Strafford, Carys Jones, Russell A. Khan, Sharon Brown, Jenny Edwards, Jonathan Hawken, Luke E. Shrimpton, Catharine P. White, Robert Powell, Inder M. S. Sawhney, William O. Pickrell, Arron S. Lacey
{"title":"Annotation of epilepsy clinic letters for natural language processing","authors":"Beata Fonferko-Shadrach, Huw Strafford, Carys Jones, Russell A. Khan, Sharon Brown, Jenny Edwards, Jonathan Hawken, Luke E. Shrimpton, Catharine P. White, Robert Powell, Inder M. S. Sawhney, William O. Pickrell, Arron S. Lacey","doi":"10.1186/s13326-024-00316-z","DOIUrl":null,"url":null,"abstract":"Natural language processing (NLP) is increasingly being used to extract structured information from unstructured text to assist clinical decision-making and aid healthcare research. The availability of expert-annotated documents for the development and validation of NLP applications is limited. We created synthetic clinical documents to address this, and to validate the Extraction of Epilepsy Clinical Text version 2 (ExECTv2) NLP pipeline. We created 200 synthetic clinic letters based on hospital outpatient consultations with epilepsy specialists. The letters were double annotated by trained clinicians and researchers according to agreed guidelines. We used the annotation tool, Markup, with an epilepsy concept list based on the Unified Medical Language System ontology. All annotations were reviewed, and a gold standard set of annotations was agreed and used to validate the performance of ExECTv2. The overall inter-annotator agreement (IAA) between the two sets of annotations produced a per item F1 score of 0.73. Validating ExECTv2 using the gold standard gave an overall F1 score of 0.87 per item, and 0.90 per letter. The synthetic letters, annotations, and annotation guidelines have been made freely available. To our knowledge, this is the first publicly available set of annotated epilepsy clinic letters and guidelines that can be used for NLP researchers with minimum epilepsy knowledge. The IAA results show that clinical text annotation tasks are difficult and require a gold standard to be arranged by researcher consensus. The results for ExECTv2, our automated epilepsy NLP pipeline, extracted detailed epilepsy information from unstructured epilepsy letters with more accuracy than human annotators, further confirming the utility of NLP for clinical and research applications.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"36 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-024-00316-z","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Natural language processing (NLP) is increasingly being used to extract structured information from unstructured text to assist clinical decision-making and aid healthcare research. The availability of expert-annotated documents for the development and validation of NLP applications is limited. We created synthetic clinical documents to address this, and to validate the Extraction of Epilepsy Clinical Text version 2 (ExECTv2) NLP pipeline. We created 200 synthetic clinic letters based on hospital outpatient consultations with epilepsy specialists. The letters were double annotated by trained clinicians and researchers according to agreed guidelines. We used the annotation tool, Markup, with an epilepsy concept list based on the Unified Medical Language System ontology. All annotations were reviewed, and a gold standard set of annotations was agreed and used to validate the performance of ExECTv2. The overall inter-annotator agreement (IAA) between the two sets of annotations produced a per item F1 score of 0.73. Validating ExECTv2 using the gold standard gave an overall F1 score of 0.87 per item, and 0.90 per letter. The synthetic letters, annotations, and annotation guidelines have been made freely available. To our knowledge, this is the first publicly available set of annotated epilepsy clinic letters and guidelines that can be used for NLP researchers with minimum epilepsy knowledge. The IAA results show that clinical text annotation tasks are difficult and require a gold standard to be arranged by researcher consensus. The results for ExECTv2, our automated epilepsy NLP pipeline, extracted detailed epilepsy information from unstructured epilepsy letters with more accuracy than human annotators, further confirming the utility of NLP for clinical and research applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为自然语言处理的癫痫门诊信件添加注释
自然语言处理(NLP)越来越多地用于从非结构化文本中提取结构化信息,以辅助临床决策和医疗保健研究。用于开发和验证 NLP 应用程序的专家注释文档非常有限。我们创建了合成临床文档来解决这一问题,并验证癫痫临床文本提取第 2 版 (ExECTv2) NLP 管道。我们根据医院门诊癫痫专家的会诊结果创建了 200 封合成门诊信件。这些信件由训练有素的临床医生和研究人员根据商定的指南进行双重注释。我们使用了标注工具 Markup 和基于统一医学语言系统本体的癫痫概念列表。我们对所有注释进行了审核,并商定了一套金标准注释,用于验证 ExECTv2 的性能。两组注释之间的注释者间总体一致性(IAA)达到了 0.73 的单项 F1 分数。使用金标准对 ExECTv2 进行验证后,每个项目的总体 F1 得分为 0.87,每个字母的 F1 得分为 0.90。合成字母、注释和注释指南已免费提供。据我们所知,这是第一套公开提供的癫痫临床信函注释和指南,可供具有最低癫痫知识的 NLP 研究人员使用。IAA 的结果表明,临床文本注释任务难度很大,需要通过研究人员的共识来安排黄金标准。我们的自动化癫痫 NLP 管道 ExECTv2 的结果显示,从非结构化癫痫信件中提取详细癫痫信息的准确性高于人类注释者,这进一步证实了 NLP 在临床和研究应用中的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biomedical Semantics
Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
4.20
自引率
5.30%
发文量
28
审稿时长
30 weeks
期刊介绍: Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.
期刊最新文献
Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed. Annotation of epilepsy clinic letters for natural language processing An extensible and unifying approach to retrospective clinical data modeling: the BrainTeaser Ontology. Concretizing plan specifications as realizables within the OBO foundry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1