Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital.

IF 3.3 Q2 GENETICS & HEREDITY HGG Advances Pub Date : 2024-08-14 DOI:10.1016/j.xhgg.2024.100341
Kate Herr, Peixin Lu, Kessi Diamreyan, Huan Xu, Eneida Mendonca, K Nicole Weaver, Jing Chen
{"title":"Estimating prevalence of rare genetic disease diagnoses using electronic health records in a children's hospital.","authors":"Kate Herr, Peixin Lu, Kessi Diamreyan, Huan Xu, Eneida Mendonca, K Nicole Weaver, Jing Chen","doi":"10.1016/j.xhgg.2024.100341","DOIUrl":null,"url":null,"abstract":"<p><p>Rare genetic diseases (RGDs) affect a significant number of individuals, particularly in pediatric populations. This study investigates the efficacy of identifying RGD diagnoses through electronic health records (EHRs) and natural language processing (NLP) tools, and analyzes the prevalence of identified RGDs for potential underdiagnosis at Cincinnati Children's Hospital Medical Center (CCHMC). EHR data from 659,139 pediatric patients at CCHMC were utilized. Diagnoses corresponding to RGDs in Orphanet were identified using rule-based and machine learning-based NLP methods. Manual evaluation assessed the precision of the NLP strategies, with 100 diagnosis descriptions reviewed for each method. The rule-based method achieved a precision of 97.5% (95% CI: 91.5%, 99.4%), while the machine-learning-based method had a precision of 73.5% (95% CI: 63.6%, 81.6%). A manual chart review of 70 randomly selected patients with RGD diagnoses confirmed the diagnoses in 90.3% (95% CI: 82.0%, 95.2%) of cases. A total of 37,326 pediatric patients were identified with 977 RGD diagnoses based on the rule-based method, resulting in a prevalence of 5.66% in this population. While a majority of the disorders showed a higher prevalence at CCHMC compared with Orphanet, some diseases, such as 1p36 deletion syndrome, indicated potential underdiagnosis. Analyses further uncovered disparities in RGD prevalence and age of diagnosis across gender and racial groups. This study demonstrates the utility of employing EHR data with NLP tools to systematically investigate RGD diagnoses in large cohorts. The identified disparities underscore the need for enhanced approaches to guarantee timely and accurate diagnosis and management of pediatric RGDs.</p>","PeriodicalId":34530,"journal":{"name":"HGG Advances","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HGG Advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.xhgg.2024.100341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Rare genetic diseases (RGDs) affect a significant number of individuals, particularly in pediatric populations. This study investigates the efficacy of identifying RGD diagnoses through electronic health records (EHRs) and natural language processing (NLP) tools, and analyzes the prevalence of identified RGDs for potential underdiagnosis at Cincinnati Children's Hospital Medical Center (CCHMC). EHR data from 659,139 pediatric patients at CCHMC were utilized. Diagnoses corresponding to RGDs in Orphanet were identified using rule-based and machine learning-based NLP methods. Manual evaluation assessed the precision of the NLP strategies, with 100 diagnosis descriptions reviewed for each method. The rule-based method achieved a precision of 97.5% (95% CI: 91.5%, 99.4%), while the machine-learning-based method had a precision of 73.5% (95% CI: 63.6%, 81.6%). A manual chart review of 70 randomly selected patients with RGD diagnoses confirmed the diagnoses in 90.3% (95% CI: 82.0%, 95.2%) of cases. A total of 37,326 pediatric patients were identified with 977 RGD diagnoses based on the rule-based method, resulting in a prevalence of 5.66% in this population. While a majority of the disorders showed a higher prevalence at CCHMC compared with Orphanet, some diseases, such as 1p36 deletion syndrome, indicated potential underdiagnosis. Analyses further uncovered disparities in RGD prevalence and age of diagnosis across gender and racial groups. This study demonstrates the utility of employing EHR data with NLP tools to systematically investigate RGD diagnoses in large cohorts. The identified disparities underscore the need for enhanced approaches to guarantee timely and accurate diagnosis and management of pediatric RGDs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用儿童医院的电子健康记录估算罕见遗传病诊断的流行率。
罕见遗传病 (RGD) 影响着许多人,尤其是儿科人群。本研究调查了通过电子健康记录(EHR)和自然语言处理(NLP)工具识别 RGD 诊断的有效性,并分析了辛辛那提儿童医院医疗中心(CCHMC)已识别的 RGD 的患病率,以发现潜在的诊断不足。研究利用了辛辛那提儿童医院医疗中心 659,139 名儿科患者的电子病历数据。使用基于规则和机器学习的 NLP 方法确定了与 Orphanet 中 RGD 相对应的诊断。人工评估评估了 NLP 策略的精确度,每种方法审查了 100 个诊断描述。基于规则的方法精确度为 97.5%(95% CI:91.5%, 99.4%),而基于机器学习的方法精确度为 73.5%(95% CI:63.6%, 81.6%)。对随机抽取的 70 名诊断为 RGD 的患者进行人工病历审查后,90.3%(95% CI:82.0%, 95.2%)的病例确诊。根据基于规则的方法,共有 37,326 名儿科患者被确诊为 977 种 RGD,在这一人群中的患病率为 5.66%。与Orphanet相比,大多数疾病在CCHMC的发病率更高,但有些疾病(如1p36缺失综合征)可能诊断不足。分析进一步发现,不同性别和种族群体的 RGD 患病率和诊断年龄存在差异。这项研究表明,利用电子病历数据和 NLP 工具对大型队列中的 RGD 诊断进行系统研究是非常有用的。已发现的差异凸显出需要加强方法,以保证及时、准确地诊断和管理儿科 RGD。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
HGG Advances
HGG Advances Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
4.30
自引率
4.50%
发文量
69
审稿时长
14 weeks
期刊最新文献
Joint genotype and ancestry analysis identify novel loci associated with atopic dermatitis in African American population. Investigation of cryptic JAG1 splice variants as a cause of Alagille syndrome and performance evaluation of splice predictor tools. Dominantly acting variants in vacuolar ATPase subunits impair lysosomal/autophagolysosome function causing a multisystemic disorder with neurocognitive impairment and multiple congenital anomalies. Extensive co-regulation of neighbouring genes complicates the use of eQTLs in target gene prioritisation. Enhancing Personalized Gene Expression Prediction From DNA Sequences Using Genomic Foundation Models.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1