Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman
{"title":"通过电子健康记录对罕见神经系统疾病进行机器学习特征描述:关于僵人综合征的原理验证研究","authors":"Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman","doi":"10.1186/s12883-024-03760-7","DOIUrl":null,"url":null,"abstract":"Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.\n","PeriodicalId":9170,"journal":{"name":"BMC Neurology","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning characterization of a rare neurologic disease via electronic health records: a proof-of-principle study on stiff person syndrome\",\"authors\":\"Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman\",\"doi\":\"10.1186/s12883-024-03760-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.\\n\",\"PeriodicalId\":9170,\"journal\":{\"name\":\"BMC Neurology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Neurology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12883-024-03760-7\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12883-024-03760-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
摘要
尽管罕见神经系统疾病(RND)的诊断经常出现延误,但由于其罕见性和统计学上的不足,研究 RND 及其合并症仍然十分困难。僵人综合征(SPS)是一种以疼痛性肌肉痉挛和僵硬为特征的 RND,每年发病率为百万分之一到二。本研究利用未充分利用的电子健康记录(EHR),展示了一种基于机器学习的框架,用于识别能够最佳描述 SPS 诊断特征的临床特征。在达特茅斯健康中心的电子病历中,有 48 人(23 人确诊为 SPS,25 人为对照组)血清谷氨酸脱羧酶-65(anti-GAD65)自身抗体升高,他们的既往病史中有 319 个项目采用了基于机器学习的特征选择方法,以确定具有最高鉴别力的特征。该算法的每次迭代都采用支持向量机(SVM)模型,为每个特征生成重要性分数--SHAPLE Additive exPlanation(SHAP)值,并删除最不突出的特征。评估指标通过重复分层交叉验证计算得出。抑郁、甲状腺功能减退症、胃食管反流病和关节疼痛是 SPS 的最大特征。利用这些特征,SVM 模型的精确度为 0.817(95% CI 0.795-0.840),灵敏度为 0.766(95% CI 0.743-0.790),F-score 为 0.761(95% CI 0.744-0.778),AUC 为 0.808(95% CI 0.791-0.825),准确度为 0.775(95% CI 0.759-0.790)。通过进一步研究,该框架发现的特征可能有助于全面描述 SPS 的病理机制:抑郁症、甲状腺功能减退症和胃食管反流病可能通过共同的炎症、遗传和自律神经失调联系分别代表合并症。这种方法可以通过发现潜在的关联并为 RNDs 提出假设,从而解决神经病学诊断方面的难题。
Machine learning characterization of a rare neurologic disease via electronic health records: a proof-of-principle study on stiff person syndrome
Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.
期刊介绍:
BMC Neurology is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of neurological disorders, as well as related molecular genetics, pathophysiology, and epidemiology.