Automatic Extraction of Nested Entities in Clinical Referrals in Spanish

P. Baez, Felipe Bravo-Marquez
{"title":"Automatic Extraction of Nested Entities in Clinical Referrals in Spanish","authors":"P. Baez, Felipe Bravo-Marquez","doi":"10.1145/3498324","DOIUrl":null,"url":null,"abstract":"Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of relations with clinical relevance. In total, there are 110,771 annotated tokens. A trained medical doctor or dentist annotated these referrals, and then, together with three other researchers, consolidated each of the annotations. The annotated corpus has 48.17% of entities embedded in other entities or containing another one. We use this corpus to build models for Named Entity Recognition (NER). The best results were achieved using a Multiple Single-entity architecture with clinical word embeddings stacked with character and Flair contextual embeddings. The entity with the best performance is abbreviation, and the hardest to recognize is finding. NER models applied to this corpus can leverage statistics of diseases and pending procedures. This work constitutes the first annotated corpus using clinical narratives from Chile and one of the few in Spanish. The annotated corpus, clinical word embeddings, annotation guidelines, and neural models are freely released to the community.","PeriodicalId":288903,"journal":{"name":"ACM Transactions on Computing for Healthcare (HEALTH)","volume":"423 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computing for Healthcare (HEALTH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3498324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 5,000 referrals (58.6% medical and 41.4% dental) was manually annotated with 10 types of entities, six attributes, and pairs of relations with clinical relevance. In total, there are 110,771 annotated tokens. A trained medical doctor or dentist annotated these referrals, and then, together with three other researchers, consolidated each of the annotations. The annotated corpus has 48.17% of entities embedded in other entities or containing another one. We use this corpus to build models for Named Entity Recognition (NER). The best results were achieved using a Multiple Single-entity architecture with clinical word embeddings stacked with character and Flair contextual embeddings. The entity with the best performance is abbreviation, and the hardest to recognize is finding. NER models applied to this corpus can leverage statistics of diseases and pending procedures. This work constitutes the first annotated corpus using clinical narratives from Chile and one of the few in Spanish. The annotated corpus, clinical word embeddings, annotation guidelines, and neural models are freely released to the community.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动提取西班牙语临床转诊信息中的嵌套实体
在这里,我们描述了一个富含嵌套实体的新临床语料库,以及一系列用于识别嵌套实体的神经模型。该语料库由智利公立医院候诊名单中去标识化的转诊病例组成。在 5,000 份转诊病例(58.6% 为医学病例,41.4% 为牙科病例)的子集中,人工标注了 10 种实体、6 种属性以及与临床相关的成对关系。总共有 110,771 个注释标记。一名训练有素的医生或牙医对这些转介进行了注释,然后与其他三名研究人员一起对每个注释进行了合并。注释过的语料中有 48.17% 的实体嵌入了其他实体或包含了另一个实体。我们利用该语料库建立了命名实体识别(NER)模型。通过使用临床词嵌入与字符和 Flair 上下文嵌入堆叠的多重单实体架构,我们取得了最佳结果。效果最好的实体是缩写,最难识别的实体是查找。应用于该语料库的 NER 模型可以利用疾病和待定程序的统计数据。这项工作构成了第一个使用智利临床叙述的注释语料库,也是为数不多的西班牙语注释语料库之一。注释语料库、临床词嵌入、注释指南和神经模型均免费向社会发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Introduction to the Special Issue on Internet-of-Medical-Things iNAP: A Hybrid Approach for NonInvasive Anemia-Polycythemia Detection in the IoMT Improving Early Prognosis of Dementia Using Machine Learning Methods Pervasive Pose Estimation for Fall Detection Automatic Extraction of Nested Entities in Clinical Referrals in Spanish
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1