Marjan Najafabadipour, M. Zanin, A. R. González, C. Gonzalo-Martín, B. García, V. Calvo, J. L. Cruz-Bermúdez, M. Provencio, Ernestina Menasalvas Ruiz
{"title":"Recognition of Time Expressions in Spanish Electronic Health Records","authors":"Marjan Najafabadipour, M. Zanin, A. R. González, C. Gonzalo-Martín, B. García, V. Calvo, J. L. Cruz-Bermúdez, M. Provencio, Ernestina Menasalvas Ruiz","doi":"10.1109/CBMS.2019.00025","DOIUrl":null,"url":null,"abstract":"The widespread adoption of Electronic Health Records (EHRs) is generating an ever-increasing amount of unstructured clinical texts. Processing time expressions from these domain-specific-texts is crucial for the discovery of patterns that can help in the detection of medical events and building the patient's natural history. In medical domain, the recognition of time information from texts is challenging due to their lack of structure; usage of various formats, styles and abbreviations; their domain specific nature; writing quality; and the presence of ambiguous expressions. Furthermore, despite of Spanish occupying the second position in the world ranking of number of native speakers, to the best of our knowledge, no Natural Language Processing (NLP) tools have been introduced for the recognition of time expressions from clinical texts, written in this particular language. Therefore, in this paper, we propose a Temporal Tagger for identifying and normalizing time expressions appeared in Spanish clinical texts. We further compare our Temporal Tagger with the Spanish version of SUTime. By using a large dataset comprising EHRs of people suffering from lung cancer, we show that our developed Temporal Tagger, with an F1 score of 0.93, outperforms SUTime, with an F1 score of 0.797.","PeriodicalId":311634,"journal":{"name":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The widespread adoption of Electronic Health Records (EHRs) is generating an ever-increasing amount of unstructured clinical texts. Processing time expressions from these domain-specific-texts is crucial for the discovery of patterns that can help in the detection of medical events and building the patient's natural history. In medical domain, the recognition of time information from texts is challenging due to their lack of structure; usage of various formats, styles and abbreviations; their domain specific nature; writing quality; and the presence of ambiguous expressions. Furthermore, despite of Spanish occupying the second position in the world ranking of number of native speakers, to the best of our knowledge, no Natural Language Processing (NLP) tools have been introduced for the recognition of time expressions from clinical texts, written in this particular language. Therefore, in this paper, we propose a Temporal Tagger for identifying and normalizing time expressions appeared in Spanish clinical texts. We further compare our Temporal Tagger with the Spanish version of SUTime. By using a large dataset comprising EHRs of people suffering from lung cancer, we show that our developed Temporal Tagger, with an F1 score of 0.93, outperforms SUTime, with an F1 score of 0.797.