Natural Language Processing Versus Diagnosis Code–Based Methods for Postherpetic Neuralgia Identification: Algorithm Development and Validation

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-09-10 DOI:10.2196/57949
Chengyi Zheng, Bradley Ackerson, Sijia Qiu, Lina S Sy, Leticia I Vega Daily, Jeannie Song, Lei Qian, Yi Luo, Jennifer H Ku, Yanjun Cheng, Jun Wu, Hung Fu Tseng
{"title":"Natural Language Processing Versus Diagnosis Code–Based Methods for Postherpetic Neuralgia Identification: Algorithm Development and Validation","authors":"Chengyi Zheng, Bradley Ackerson, Sijia Qiu, Lina S Sy, Leticia I Vega Daily, Jeannie Song, Lei Qian, Yi Luo, Jennifer H Ku, Yanjun Cheng, Jun Wu, Hung Fu Tseng","doi":"10.2196/57949","DOIUrl":null,"url":null,"abstract":"Background: Diagnosis codes and prescription data are used in algorithms to identify postherpetic neuralgia (PHN), a debilitating complication of herpes zoster (HZ). Because of the questionable accuracy of codes and prescription data, manual chart review is sometimes used to identify PHN in electronic health records (EHR), which can be costly and time-consuming. Objective: To develop and validate a natural language processing (NLP) algorithm for automatically identifying PHN from unstructured EHR data. To compare its performance with that of code-based methods. Methods: This retrospective study used EHR data from Kaiser Permanente Southern California, a large integrated healthcare system that serves over 4.8 million members. The source population included members aged ≥50 years who received an incident HZ diagnosis and accompanying antiviral prescription between 2018-2020 and had ≥1 encounter within 90-180 days of the incident HZ diagnosis. The study team manually reviewed the EHR and identified PHN cases. For NLP development and validation, 500 and 800 random samples from the source population were selected, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, and Matthews correlation coefficient (MCC) of NLP and the code-based methods were evaluated using chart-reviewed results as the reference standard. Results: The NLP algorithm identified PHN cases with 90.9% sensitivity, 98.5% specificity, 82.0% PPV, and 99.3% NPV. The composite scores of the NLP algorithm were 0.89 (F-score) and 0.85 (MCC). The prevalences of PHN in the validation data were 6.9% (reference standard), 7.6% (NLP), and 5.4-13.1% (code-based). The code-based methods achieved 52.7-61.8% sensitivity, 89.8-98.4% specificity, 27.6-72.1% PPV, and 96.3-97.1% NPV. The F-scores and MCCs were ranged between 0.45-0.59 and 0.32-0.61, respectively. Conclusions: The automated NLP-based approach identified PHN cases from the EHR with good accuracy. This method could be useful in population-based PHN research.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"56 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/57949","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Diagnosis codes and prescription data are used in algorithms to identify postherpetic neuralgia (PHN), a debilitating complication of herpes zoster (HZ). Because of the questionable accuracy of codes and prescription data, manual chart review is sometimes used to identify PHN in electronic health records (EHR), which can be costly and time-consuming. Objective: To develop and validate a natural language processing (NLP) algorithm for automatically identifying PHN from unstructured EHR data. To compare its performance with that of code-based methods. Methods: This retrospective study used EHR data from Kaiser Permanente Southern California, a large integrated healthcare system that serves over 4.8 million members. The source population included members aged ≥50 years who received an incident HZ diagnosis and accompanying antiviral prescription between 2018-2020 and had ≥1 encounter within 90-180 days of the incident HZ diagnosis. The study team manually reviewed the EHR and identified PHN cases. For NLP development and validation, 500 and 800 random samples from the source population were selected, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, and Matthews correlation coefficient (MCC) of NLP and the code-based methods were evaluated using chart-reviewed results as the reference standard. Results: The NLP algorithm identified PHN cases with 90.9% sensitivity, 98.5% specificity, 82.0% PPV, and 99.3% NPV. The composite scores of the NLP algorithm were 0.89 (F-score) and 0.85 (MCC). The prevalences of PHN in the validation data were 6.9% (reference standard), 7.6% (NLP), and 5.4-13.1% (code-based). The code-based methods achieved 52.7-61.8% sensitivity, 89.8-98.4% specificity, 27.6-72.1% PPV, and 96.3-97.1% NPV. The F-scores and MCCs were ranged between 0.45-0.59 and 0.32-0.61, respectively. Conclusions: The automated NLP-based approach identified PHN cases from the EHR with good accuracy. This method could be useful in population-based PHN research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自然语言处理与基于诊断代码的带状疱疹后神经痛识别方法:算法开发与验证
背景:诊断代码和处方数据被用于识别带状疱疹后神经痛(PHN)的算法中,带状疱疹后神经痛是带状疱疹(HZ)的一种使人衰弱的并发症。由于代码和处方数据的准确性值得怀疑,有时会使用人工病历审查来识别电子健康记录(EHR)中的 PHN,这可能既费钱又费时。目标:开发并验证一种自然语言处理(NLP)算法,用于从非结构化电子病历数据中自动识别 PHN。将其性能与基于代码的方法进行比较。方法:这项回顾性研究使用了南加州 Kaiser Permanente 的电子病历数据,这是一个为超过 480 万会员提供服务的大型综合医疗保健系统。研究对象包括年龄≥50 岁的会员,他们在 2018-2020 年间接受过 HZ 诊断和相应的抗病毒处方,并在 HZ 诊断后的 90-180 天内进行过≥1 次就诊。研究小组人工审核了电子病历并确定了 PHN 病例。为进行 NLP 开发和验证,分别从源人群中随机抽取了 500 和 800 个样本。以图表审查结果为参考标准,评估了 NLP 和基于代码方法的灵敏度、特异性、阳性预测值 (PPV)、阴性预测值 (NPV)、F 评分和马修斯相关系数 (MCC)。结果NLP 算法识别 PHN 病例的灵敏度为 90.9%,特异度为 98.5%,PPV 为 82.0%,NPV 为 99.3%。NLP 算法的综合评分为 0.89(F-score)和 0.85(MCC)。验证数据中的 PHN 患病率分别为 6.9%(参考标准)、7.6%(NLP)和 5.4-13.1%(基于代码)。基于代码的方法的灵敏度为 52.7-61.8%,特异度为 89.8-98.4%,PPV 为 27.6-72.1%,NPV 为 96.3-97.1%。F score 和 MCC 分别介于 0.45-0.59 和 0.32-0.61 之间。结论基于 NLP 的自动方法能从电子病历中准确识别 PHN 病例。这种方法可用于基于人群的 PHN 研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
期刊最新文献
Factors Contributing to Successful Information System Implementation and Employee Well-Being in Health Care and Social Welfare Professionals: Comparative Cross-Sectional Study. Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches. Correlation between Diagnosis-related Group Weights and Nursing Time in the Cardiology Department: A Cross-sectional Study. Data Ownership in the AI-Powered Integrative Health Care Landscape. Medication Prescription Policy for US Veterans With Metastatic Castration-Resistant Prostate Cancer: Causal Machine Learning Approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1