Machine learning characterization of a rare neurologic disease via electronic health records: a proof-of-principle study on stiff person syndrome

IF 2.2 3区 医学 Q3 CLINICAL NEUROLOGY BMC Neurology Pub Date : 2024-08-03 DOI:10.1186/s12883-024-03760-7
Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman
{"title":"Machine learning characterization of a rare neurologic disease via electronic health records: a proof-of-principle study on stiff person syndrome","authors":"Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman","doi":"10.1186/s12883-024-03760-7","DOIUrl":null,"url":null,"abstract":"Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.\n","PeriodicalId":9170,"journal":{"name":"BMC Neurology","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12883-024-03760-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过电子健康记录对罕见神经系统疾病进行机器学习特征描述:关于僵人综合征的原理验证研究
尽管罕见神经系统疾病(RND)的诊断经常出现延误,但由于其罕见性和统计学上的不足,研究 RND 及其合并症仍然十分困难。僵人综合征(SPS)是一种以疼痛性肌肉痉挛和僵硬为特征的 RND,每年发病率为百万分之一到二。本研究利用未充分利用的电子健康记录(EHR),展示了一种基于机器学习的框架,用于识别能够最佳描述 SPS 诊断特征的临床特征。在达特茅斯健康中心的电子病历中,有 48 人(23 人确诊为 SPS,25 人为对照组)血清谷氨酸脱羧酶-65(anti-GAD65)自身抗体升高,他们的既往病史中有 319 个项目采用了基于机器学习的特征选择方法,以确定具有最高鉴别力的特征。该算法的每次迭代都采用支持向量机(SVM)模型,为每个特征生成重要性分数--SHAPLE Additive exPlanation(SHAP)值,并删除最不突出的特征。评估指标通过重复分层交叉验证计算得出。抑郁、甲状腺功能减退症、胃食管反流病和关节疼痛是 SPS 的最大特征。利用这些特征,SVM 模型的精确度为 0.817(95% CI 0.795-0.840),灵敏度为 0.766(95% CI 0.743-0.790),F-score 为 0.761(95% CI 0.744-0.778),AUC 为 0.808(95% CI 0.791-0.825),准确度为 0.775(95% CI 0.759-0.790)。通过进一步研究,该框架发现的特征可能有助于全面描述 SPS 的病理机制:抑郁症、甲状腺功能减退症和胃食管反流病可能通过共同的炎症、遗传和自律神经失调联系分别代表合并症。这种方法可以通过发现潜在的关联并为 RNDs 提出假设,从而解决神经病学诊断方面的难题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Neurology
BMC Neurology 医学-临床神经学
CiteScore
4.20
自引率
0.00%
发文量
428
审稿时长
3-8 weeks
期刊介绍: BMC Neurology is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of neurological disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
期刊最新文献
Cascade testing in mitochondrial diseases: a cross-sectional retrospective study The relationship between HbA1c and the activities of daily living in complex chronic patients with and without intracerebral hemorrhage Brain abscesses: the first report of disseminated Nocardia beijingensis infection in an immunocompetent individual in China Energy metabolism-related GLUD1 contributes to favorable clinical outcomes of IDH-mutant glioma Pharmacological and physiological effects of cannabidiol: a dose escalation, placebo washout study protocol
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1