{"title":"Exploiting word meaning for negation identification in electronic health records","authors":"Ioana Barbantan, R. Potolea","doi":"10.1109/AQTR.2014.6857880","DOIUrl":null,"url":null,"abstract":"Topic extraction from Electronic Health Records is a sensitive step of the knowledge extraction process. As the meaning of a discourse can be completely distorted by negations, the correct identification of terms vs negated terms is mandatory. Our work is an attempt of automated negation identification in unstructured health records. We analyzed a corpus of medical documents containing 5103 sentences and we found that while adverbs have a distribution of 3%, the negation covers almost 2% of the words used in the corpus, justifying an in depth analysis of negation. The main contribution of the paper addresses the existing drawback of negation identification approaches in the literature that do not consider negation represented with negation prefixes. In this paper we address the tasks of syntactic and morphologic negation identification. In order to identify morphologic negation we propose the PreNex algorithm that consists in breaking down the terms into prefix and root word and the analysis of the root's validity using additional available resources (WordNet). The syntactic negation identification relies on a pattern matching approach where the negated concepts are identified based on a predefined Ust of negation identifiers. The results we obtained are promising and ensure a reliable negation identification approach for medical documents. We report a precision of 92.62% and recall of 93.60% in case of the morphologic negation identification and an overall performance in the morphologic and syntactic negation identification of 95.96% precision and 94.23% recall.","PeriodicalId":297141,"journal":{"name":"2014 IEEE International Conference on Automation, Quality and Testing, Robotics","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Automation, Quality and Testing, Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AQTR.2014.6857880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Topic extraction from Electronic Health Records is a sensitive step of the knowledge extraction process. As the meaning of a discourse can be completely distorted by negations, the correct identification of terms vs negated terms is mandatory. Our work is an attempt of automated negation identification in unstructured health records. We analyzed a corpus of medical documents containing 5103 sentences and we found that while adverbs have a distribution of 3%, the negation covers almost 2% of the words used in the corpus, justifying an in depth analysis of negation. The main contribution of the paper addresses the existing drawback of negation identification approaches in the literature that do not consider negation represented with negation prefixes. In this paper we address the tasks of syntactic and morphologic negation identification. In order to identify morphologic negation we propose the PreNex algorithm that consists in breaking down the terms into prefix and root word and the analysis of the root's validity using additional available resources (WordNet). The syntactic negation identification relies on a pattern matching approach where the negated concepts are identified based on a predefined Ust of negation identifiers. The results we obtained are promising and ensure a reliable negation identification approach for medical documents. We report a precision of 92.62% and recall of 93.60% in case of the morphologic negation identification and an overall performance in the morphologic and syntactic negation identification of 95.96% precision and 94.23% recall.