Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman
{"title":"Machine learning characterization of a rare neurologic disease via electronic health records: a proof-of-principle study on stiff person syndrome","authors":"Soo Hwan Park, Seo Ho Song, Frederick Burton, Cybèle Arsan, Barbara Jobst, Mary Feldman","doi":"10.1186/s12883-024-03760-7","DOIUrl":null,"url":null,"abstract":"Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.\n","PeriodicalId":9170,"journal":{"name":"BMC Neurology","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12883-024-03760-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the frequent diagnostic delays of rare neurologic diseases (RND), it remains difficult to study RNDs and their comorbidities due to their rarity and hence the statistical underpowering. Affecting one to two in a million annually, stiff person syndrome (SPS) is an RND characterized by painful muscle spasms and rigidity. Leveraging underutilized electronic health records (EHR), this study showcased a machine-learning-based framework to identify clinical features that optimally characterize the diagnosis of SPS. A machine-learning-based feature selection approach was employed on 319 items from the past medical histories of 48 individuals (23 with a diagnosis of SPS and 25 controls) with elevated serum autoantibodies against glutamic-acid-decarboxylase-65 (anti-GAD65) in Dartmouth Health’s EHR to determine features with the highest discriminatory power. Each iteration of the algorithm implemented a Support Vector Machine (SVM) model, generating importance scores—SHapley Additive exPlanation (SHAP) values—for each feature and removing one with the least salient. Evaluation metrics were calculated through repeated stratified cross-validation. Depression, hypothyroidism, GERD, and joint pain were the most characteristic features of SPS. Utilizing these features, the SVM model attained precision of 0.817 (95% CI 0.795–0.840), sensitivity of 0.766 (95% CI 0.743–0.790), F-score of 0.761 (95% CI 0.744–0.778), AUC of 0.808 (95% CI 0.791–0.825), and accuracy of 0.775 (95% CI 0.759–0.790). This framework discerned features that, with further research, may help fully characterize the pathologic mechanism of SPS: depression, hypothyroidism, and GERD may respectively represent comorbidities through common inflammatory, genetic, and dysautonomic links. This methodology could address diagnostic challenges in neurology by uncovering latent associations and generating hypotheses for RNDs.
期刊介绍:
BMC Neurology is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of neurological disorders, as well as related molecular genetics, pathophysiology, and epidemiology.