葡萄牙语肿瘤健康记录的生物医学实体提取管道

IF 0.4 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Applied Computing Review Pub Date : 2023-03-27 DOI:10.1145/3555776.3578577
Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes
{"title":"葡萄牙语肿瘤健康记录的生物医学实体提取管道","authors":"Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes","doi":"10.1145/3555776.3578577","DOIUrl":null,"url":null,"abstract":"Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":"52 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese\",\"authors\":\"Hugo Sousa, Arian Pasquali, Alípio Jorge, Catarina Sousa Santos, M'ario Amorim Lopes\",\"doi\":\"10.1145/3555776.3578577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.\",\"PeriodicalId\":42971,\"journal\":{\"name\":\"Applied Computing Review\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing Review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555776.3578577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3578577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

摘要

癌症患者的文本健康记录通常是冗长且高度无结构的,这使得卫生专业人员对患者的治疗过程进行完整的概述非常耗时。由于这些限制可能导致次优和/或低效的治疗程序,医疗保健提供者将从有效总结这些记录信息的系统中受益匪浅。随着深度神经模型的出现,这一目标在英语临床文本中已经部分实现,然而,对于资源有限的语言,研究界仍然缺乏有效的解决方案。在本文中,我们提出了我们开发的方法,从欧洲葡萄牙语写的肿瘤健康记录中提取程序,药物和疾病。该项目是与葡萄牙肿瘤研究所合作开展的,该研究所除了保存了10多年得到适当保护的医疗记录外,还在整个项目开发过程中提供了肿瘤学家的专业知识。由于没有葡萄牙语生物医学实体提取的注释语料库,我们还提出了我们在为模型开发注释语料库时遵循的策略。最终的模型结合了神经结构和实体链接,在程序、药物和疾病的提及提取方面分别获得了88.6、95.0和55.8%的F1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved F1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Computing Review
Applied Computing Review COMPUTER SCIENCE, INFORMATION SYSTEMS-
自引率
40.00%
发文量
8
期刊最新文献
DIWS-LCR-Rot-hop++: A Domain-Independent Word Selector for Cross-Domain Aspect-Based Sentiment Classification Leveraging Semantic Technologies for Collaborative Inference of Threatening IoT Dependencies Relating Optimal Repairs in Ontology Engineering with Contraction Operations in Belief Change Block-RACS: Towards Reputation-Aware Client Selection and Monetization Mechanism for Federated Learning Elastic Data Binning: Time-Series Sketching for Time-Domain Astrophysics Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1