基于BERT的中文电子病历关系提取方法研究

Shengxin Gao, Jinlian Du, Xiao Zhang
{"title":"基于BERT的中文电子病历关系提取方法研究","authors":"Shengxin Gao, Jinlian Du, Xiao Zhang","doi":"10.1145/3404555.3404635","DOIUrl":null,"url":null,"abstract":"Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Research on Relation Extraction Method of Chinese Electronic Medical Records Based on BERT\",\"authors\":\"Shengxin Gao, Jinlian Du, Xiao Zhang\",\"doi\":\"10.1145/3404555.3404635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.\",\"PeriodicalId\":220526,\"journal\":{\"name\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3404555.3404635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

关系提取是获取电子病历信息的必要步骤。关系提取的深度学习方法主要基于word2vec和卷积或递归神经网络。然而,word2vec生成的词向量是静态的,不能很好地反映多义词在不同语境下的不同含义,RNN (Recurrent Neural Network)的特征提取能力也不够好。同时,BERT (Bidirectional Encoder Representations from Transformers)预训练语言模型在许多自然语言处理任务中取得了优异的效果。本文提出了一种基于BERT的医学关系提取模型。我们将从预训练语言模型中获得的整句信息与两个医疗实体的对应信息相结合,完成关系提取任务。实验数据来源于北京某医院提供的中文电子病历。电子病历的实验结果表明,模型的准确率、精密度、查全率和f1得分分别达到67.37%、69.54%、67.38%、68.44%,均高于其他三种方法。由于命名实体识别任务是关系提取的前提,我们将在今后的工作中将该模型与命名实体识别相结合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Research on Relation Extraction Method of Chinese Electronic Medical Records Based on BERT
Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
mRNA Big Data Analysis of Hepatoma Carcinoma Between Different Genders Generalization or Instantiation?: Estimating the Relative Abstractness between Images and Text Auxiliary Edge Detection for Semantic Image Segmentation Intrusion Detection of Abnormal Objects for Railway Scenes Using Infrared Images Multi-Tenant Machine Learning Platform Based on Kubernetes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1