I. M. Soriano, J. Castro, J. Fernández-breis, I. S. Román, A. A. Barriuso, David Guevara Baraza
{"title":"Snomed2Vec: Representation of SNOMED CT Terms with Word2Vec","authors":"I. M. Soriano, J. Castro, J. Fernández-breis, I. S. Román, A. A. Barriuso, David Guevara Baraza","doi":"10.1109/CBMS.2019.00138","DOIUrl":null,"url":null,"abstract":"Hospital Information Systems (H.I.S) use Electronic Health Record to store heterogeneous data from the patients. One important goal in this kind of systems is that the information must be, normalized and codify with a clinical terminology to represent exactly the healthcare meaning. Usually this process need human experts to identify and map the correct concept, this is a slow and tedious task. One of the most widespread clinical terminologies with more projection is Snomed-CT. This is an ontology multilingual clinical terminology that represent the clinical concepts with a unique code. We introduce in this paper Snomed2Vec, new approach of semantic search tool to find the most similar concepts using Snomed-CT. This is an ontology based named entity recognition system using word embedding, that suggest what is the most similar concept, that appear in a text. To evaluate the tool we suggest two kind of validations, one against a corpus gold with diagnostic from clinical reports, and a social validation, with a public free web access. We publish an access web to the academic world to use, test and validate the tool. The results of validation shows that this process help to the specialist to the election of choose the correct concepts from Snomed-CT. The paper illustrates 1) how create the initial big corpus of texts, to train the word2vec models, 2) how we use this vector space model to create our final Snomed2Vec vector space model, 3) The use of the cosine similarity distance, to obtain the most similar concepts, grouping by the hierarchies from Snomed-CT. We publish to the academic world: https://github.com/NachusS/Snomed2Vec access to the public web tool, and the notebook, for develop and test this paper.","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"27 1","pages":"678-683"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2019.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Hospital Information Systems (H.I.S) use Electronic Health Record to store heterogeneous data from the patients. One important goal in this kind of systems is that the information must be, normalized and codify with a clinical terminology to represent exactly the healthcare meaning. Usually this process need human experts to identify and map the correct concept, this is a slow and tedious task. One of the most widespread clinical terminologies with more projection is Snomed-CT. This is an ontology multilingual clinical terminology that represent the clinical concepts with a unique code. We introduce in this paper Snomed2Vec, new approach of semantic search tool to find the most similar concepts using Snomed-CT. This is an ontology based named entity recognition system using word embedding, that suggest what is the most similar concept, that appear in a text. To evaluate the tool we suggest two kind of validations, one against a corpus gold with diagnostic from clinical reports, and a social validation, with a public free web access. We publish an access web to the academic world to use, test and validate the tool. The results of validation shows that this process help to the specialist to the election of choose the correct concepts from Snomed-CT. The paper illustrates 1) how create the initial big corpus of texts, to train the word2vec models, 2) how we use this vector space model to create our final Snomed2Vec vector space model, 3) The use of the cosine similarity distance, to obtain the most similar concepts, grouping by the hierarchies from Snomed-CT. We publish to the academic world: https://github.com/NachusS/Snomed2Vec access to the public web tool, and the notebook, for develop and test this paper.