Hannah Eyre, Patrick R. Alba, Carolyn J Gibson, E. Gatsby, Kristine E Lynch, Olga V. Patterson, S. Duvall
{"title":"Bridging information gaps in menopause status classification through natural language processing","authors":"Hannah Eyre, Patrick R. Alba, Carolyn J Gibson, E. Gatsby, Kristine E Lynch, Olga V. Patterson, S. Duvall","doi":"10.1093/jamiaopen/ooae013","DOIUrl":null,"url":null,"abstract":"\n \n \n To use natural language processing (NLP) of clinical notes to augment existing structured electronic health record (EHR) data for classification of a patient’s menopausal status.\n \n \n \n A rule-based NLP system was designed to capture evidence of a patient’s menopause status including dates of a patient’s last menstrual period, reproductive surgeries, and postmenopause diagnosis as well as their use of birth control and menstrual interruptions. nlp-derived output was used in combination with structured EHR data to classify a patient’s menopausal status. NLP processing and patient classification was performed on a cohort of 307,512 female Veterans receiving healthcare at the US Department of Veterans Affairs (VA).\n \n \n \n NLP was validated at 99.6% precision. Including the nlp-derived data into a menopause phenotype increased the number of patients with data relevant to their menopausal status by 118%. Using structured codes alone, 81,173 (27.0%) are able to be classified as postmenopausal or premenopausal. However, with the inclusion of NLP, this number increased 167,804 (54.6%) patients. The premenopausal category grew by 532.7% with the inclusion of NLP data.\n \n \n \n By employing NLP, it became possible to identify documented data elements that predate VA care, originate outside VA networks, or have no corresponding structured field in the VA EHR that would be otherwise inaccessible for further analysis.\n \n \n \n NLP can be used to identify concepts relevant to a patient’s menopausal status in clinical notes. Adding nlp-derived data to an algorithm classifying a patient’s menopausal status significantly increases the number of patients classified using EHR data, ultimately enabling more detailed assessments of the impact of menopause on health outcomes.\n","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"26 1-4","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooae013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
Abstract
To use natural language processing (NLP) of clinical notes to augment existing structured electronic health record (EHR) data for classification of a patient’s menopausal status.
A rule-based NLP system was designed to capture evidence of a patient’s menopause status including dates of a patient’s last menstrual period, reproductive surgeries, and postmenopause diagnosis as well as their use of birth control and menstrual interruptions. nlp-derived output was used in combination with structured EHR data to classify a patient’s menopausal status. NLP processing and patient classification was performed on a cohort of 307,512 female Veterans receiving healthcare at the US Department of Veterans Affairs (VA).
NLP was validated at 99.6% precision. Including the nlp-derived data into a menopause phenotype increased the number of patients with data relevant to their menopausal status by 118%. Using structured codes alone, 81,173 (27.0%) are able to be classified as postmenopausal or premenopausal. However, with the inclusion of NLP, this number increased 167,804 (54.6%) patients. The premenopausal category grew by 532.7% with the inclusion of NLP data.
By employing NLP, it became possible to identify documented data elements that predate VA care, originate outside VA networks, or have no corresponding structured field in the VA EHR that would be otherwise inaccessible for further analysis.
NLP can be used to identify concepts relevant to a patient’s menopausal status in clinical notes. Adding nlp-derived data to an algorithm classifying a patient’s menopausal status significantly increases the number of patients classified using EHR data, ultimately enabling more detailed assessments of the impact of menopause on health outcomes.
期刊介绍:
ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications.
The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.