{"title":"基于BERT的中文电子病历关系提取方法研究","authors":"Shengxin Gao, Jinlian Du, Xiao Zhang","doi":"10.1145/3404555.3404635","DOIUrl":null,"url":null,"abstract":"Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Research on Relation Extraction Method of Chinese Electronic Medical Records Based on BERT\",\"authors\":\"Shengxin Gao, Jinlian Du, Xiao Zhang\",\"doi\":\"10.1145/3404555.3404635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.\",\"PeriodicalId\":220526,\"journal\":{\"name\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3404555.3404635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
摘要
关系提取是获取电子病历信息的必要步骤。关系提取的深度学习方法主要基于word2vec和卷积或递归神经网络。然而,word2vec生成的词向量是静态的,不能很好地反映多义词在不同语境下的不同含义,RNN (Recurrent Neural Network)的特征提取能力也不够好。同时,BERT (Bidirectional Encoder Representations from Transformers)预训练语言模型在许多自然语言处理任务中取得了优异的效果。本文提出了一种基于BERT的医学关系提取模型。我们将从预训练语言模型中获得的整句信息与两个医疗实体的对应信息相结合,完成关系提取任务。实验数据来源于北京某医院提供的中文电子病历。电子病历的实验结果表明,模型的准确率、精密度、查全率和f1得分分别达到67.37%、69.54%、67.38%、68.44%,均高于其他三种方法。由于命名实体识别任务是关系提取的前提,我们将在今后的工作中将该模型与命名实体识别相结合。
Research on Relation Extraction Method of Chinese Electronic Medical Records Based on BERT
Relation extraction is a necessary step in obtaining information from electronic medical records. The deep learning methods for relation extraction are primarily based on word2vec and convolutional or recurrent neural network. However, word vectors generated by word2vec are static and cannot well reflect the different meanings of polysemy in different contexts and the feature extraction ability of RNN (Recurrent Neural Network) is not good enough. At the same time, the BERT (Bidirectional Encoder Representations from Transformers) pre-trained language model has achieved excellent results in many natural language processing tasks. In this paper, we propose a medical relation extraction model based on BERT. We combine the information of the whole sentence obtained from the pre-train language model with the corresponding information of two medical entities to complete relation extraction task. The experimental data were obtained from the Chinese electronic medical records provided by a hospital in Beijing. Experimental results on electronic medical records show that our model's accuracy, precision, recall, and F1-score reach 67.37%, 69.54%, 67.38%, 68.44%, which are higher than other three methods. Because named entity recognition task is the premise of relation extraction, we will combine the model with named entity recognition in the future work.