STMC: Semantic Tag Medical Concept Using Word2Vec Representation

I. M. Soriano, J. Castro
{"title":"STMC: Semantic Tag Medical Concept Using Word2Vec Representation","authors":"I. M. Soriano, J. Castro","doi":"10.1109/CBMS.2018.00075","DOIUrl":null,"url":null,"abstract":"In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital \"Rafael Mendez\" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"32 1","pages":"393-398"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this paper we propose a recognition system of medical concepts from free text clinical reports. Our approach tries to recognize also concepts which are named with local terminology, with medical writing scripts, short words, abbreviations and even spelling mistakes. We consider a clinical terminology ontology (Snomed-CT), as a dictionary of concepts. In a first step we obtain an embedding model using word2vec methodology from a big corpus database of clinical reports. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space, and so the geometrical similarity can be considered a measure of semantic relation. We have considered 615513 emergency clinical reports from the Hospital "Rafael Mendez" in Lorca, Murcia. In these reports there are a lot of local language of the emergency domain, medical writing scripts, short words, abbreviations and even spelling mistakes. With the model obtained we represent the words and sentences as vectors, and by applying cosine similarity we identify which concepts of the ontology are named in the text. Finally, we represent the clinical reports (EHR) like a bag of concepts, and use this representation to search similar documents. The paper illustrates 1) how we build the word2vec model from the free text clinical reports, 2) How we extend the embedding from words to sentences, and 3) how we use the cosine similarity to identify concepts. The experimentation, and expert human validation, shows that: a) the concepts named in the text with the ontology terminology are well recognized, and b) others concepts that are not named with the ontology terminology are also recognized, obtaining a high precision and recall measures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Word2Vec表示的语义标签医学概念
本文提出了一种基于自由文本临床报告的医学概念识别系统。我们的方法也试图识别用当地术语命名的概念,医学写作脚本,短句,缩写,甚至拼写错误。我们考虑一个临床术语本体(Snomed-CT),作为一个概念词典。在第一步中,我们使用word2vec方法从大型临床报告语料库数据库中获得嵌入模型。词向量被定位在向量空间中,使得语料库中具有共同上下文的词在空间中彼此接近,因此几何相似性可以被认为是语义关系的度量。我们审议了穆尔西亚洛尔卡"拉斐尔·门德斯"医院的615513份紧急临床报告。在这些报告中有大量的应急领域的当地语言,医学写作脚本,短词,缩写,甚至拼写错误。利用得到的模型,我们将单词和句子表示为向量,并通过余弦相似度来识别本体的哪些概念在文本中被命名。最后,我们将临床报告(EHR)表示为概念包,并使用这种表示来搜索类似的文档。本文阐述了1)如何从自由文本临床报告中构建word2vec模型,2)如何将嵌入从单词扩展到句子,以及3)如何使用余弦相似度来识别概念。实验和专家人工验证表明:a)文本中使用本体术语命名的概念被很好地识别,b)其他未使用本体术语命名的概念也被识别,获得了较高的准确率和召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated Design of Task-Dedicated Illumination with Particle Swarm Optimization Automatic Polyp Segmentation with Multiple Kernel Dilated Convolution Network. Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network. A Gamification-Based Framework for mHealth Developers in the Context of Self-Care Mental Health Ubiquitous Monitoring: Detecting Context-Enriched Sociability Patterns Through Complex Event Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1