Alissa Hutto, Tarek M Zikry, Buck Bohac, Terra Rose, Jasmine Staebler, Janet Slay, C Ray Cheever, Michael R Kosorok, Rebekah P Nash
{"title":"使用自然语言处理工具包按精神病诊断对电子健康记录进行分类。","authors":"Alissa Hutto, Tarek M Zikry, Buck Bohac, Terra Rose, Jasmine Staebler, Janet Slay, C Ray Cheever, Michael R Kosorok, Rebekah P Nash","doi":"10.1177/14604582241296411","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> We analyzed a natural language processing (NLP) toolkit's ability to classify unstructured EHR data by psychiatric diagnosis. Expertise can be a barrier to using NLP. We employed an NLP toolkit (CLARK) created to support studies led by investigators with a range of informatics knowledge. <b>Methods:</b> The EHR of 652 patients were manually reviewed to establish Depression and Substance Use Disorder (SUD) labeled datasets, which were split into training and evaluation datasets. We used CLARK to train depression and SUD classification models using training datasets; model performance was analyzed against evaluation datasets. <b>Results:</b> The depression model accurately classified 69% of records (sensitivity = 0.68, specificity = 0.70, F1 = 0.68). The SUD model accurately classified 84% of records (sensitivity = 0.56, specificity = 0.92, F1 = 0.57). <b>Conclusion:</b> The depression model performed a more balanced job, while the SUD model's high specificity was paired with a low sensitivity. NLP applications may be especially helpful when combined with a confidence threshold for manual review.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"30 4","pages":"14604582241296411"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657637/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis.\",\"authors\":\"Alissa Hutto, Tarek M Zikry, Buck Bohac, Terra Rose, Jasmine Staebler, Janet Slay, C Ray Cheever, Michael R Kosorok, Rebekah P Nash\",\"doi\":\"10.1177/14604582241296411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Objective:</b> We analyzed a natural language processing (NLP) toolkit's ability to classify unstructured EHR data by psychiatric diagnosis. Expertise can be a barrier to using NLP. We employed an NLP toolkit (CLARK) created to support studies led by investigators with a range of informatics knowledge. <b>Methods:</b> The EHR of 652 patients were manually reviewed to establish Depression and Substance Use Disorder (SUD) labeled datasets, which were split into training and evaluation datasets. We used CLARK to train depression and SUD classification models using training datasets; model performance was analyzed against evaluation datasets. <b>Results:</b> The depression model accurately classified 69% of records (sensitivity = 0.68, specificity = 0.70, F1 = 0.68). The SUD model accurately classified 84% of records (sensitivity = 0.56, specificity = 0.92, F1 = 0.57). <b>Conclusion:</b> The depression model performed a more balanced job, while the SUD model's high specificity was paired with a low sensitivity. NLP applications may be especially helpful when combined with a confidence threshold for manual review.</p>\",\"PeriodicalId\":55069,\"journal\":{\"name\":\"Health Informatics Journal\",\"volume\":\"30 4\",\"pages\":\"14604582241296411\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657637/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Informatics Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/14604582241296411\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241296411","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis.
Objective: We analyzed a natural language processing (NLP) toolkit's ability to classify unstructured EHR data by psychiatric diagnosis. Expertise can be a barrier to using NLP. We employed an NLP toolkit (CLARK) created to support studies led by investigators with a range of informatics knowledge. Methods: The EHR of 652 patients were manually reviewed to establish Depression and Substance Use Disorder (SUD) labeled datasets, which were split into training and evaluation datasets. We used CLARK to train depression and SUD classification models using training datasets; model performance was analyzed against evaluation datasets. Results: The depression model accurately classified 69% of records (sensitivity = 0.68, specificity = 0.70, F1 = 0.68). The SUD model accurately classified 84% of records (sensitivity = 0.56, specificity = 0.92, F1 = 0.57). Conclusion: The depression model performed a more balanced job, while the SUD model's high specificity was paired with a low sensitivity. NLP applications may be especially helpful when combined with a confidence threshold for manual review.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.