Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick
{"title":"A machine learning model for early diagnosis of type 1 Gaucher disease using real-life data","authors":"Avraham Tenenbaum , Shoshana Revel-Vilk , Sivan Gazit , Michael Roimi , Aidan Gill , Dafna Gilboa , Ora Paltiel , Orly Manor , Varda Shalev , Gabriel Chodick","doi":"10.1016/j.jclinepi.2024.111517","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.</div></div><div><h3>Study Design and Setting</h3><div>We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.</div></div><div><h3>Results</h3><div>We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).</div></div><div><h3>Conclusion</h3><div>Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.</div></div><div><h3>Plain Language Summary</h3><div>Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features linked to type 1 Gaucher disease, including specific diagnoses and patterns in patients’ healthcare-seeking behaviors. We believe that the tool of machine learning can be valuable for patients with rare diseases.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"175 ","pages":"Article 111517"},"PeriodicalIF":7.3000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435624002737","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
The diagnosis of Gaucher disease (GD) presents a major challenge due to the high variability and low specificity of its clinical characteristics, along with limited physician awareness of the disease’s early symptoms. Early and accurate diagnosis is important to enable effective treatment decisions, prevent unnecessary testing, and facilitate genetic counseling. This study aimed to develop a machine learning (ML) model for GD screening and GD early diagnosis based on real-world clinical data using the Maccabi Healthcare Services electronic database, which contains 20 years of longitudinal data on approximately 2.6 million patients.
Study Design and Setting
We screened the Maccabi Healthcare Services database for patients with GD between January 1998 and May 2022. Eligible controls were matched by year of birth, sex, and socioeconomic status in a 1:13 ratio. The data were partitioned into 75% training and 25% test sets and trained to predict GD using features obtained from medical and laboratory records. Model performances were evaluated using the area under the receiver operating characteristic curve and the area under the precision-recall curve.
Results
We detected 264 confirmed patients with GD to which we matched 3,429 controls. The best model performance (which included known GD signs and symptoms, previously unknown clinical features, and administrative codes) on the test set had an area under the receiver operating characteristic curve = 0.95 ± 0.03 and area under the precision-recall curve = 0.80 ± 0.08, which yielded a median GD identification of 2.78 years earlier than the clinical diagnosis (25th–75th percentile: 1.29–4.53).
Conclusion
Using an ML approach on real-world data led to excellent discrimination between GD patients and controls, with the ability to detect GD significantly earlier than the time of actual diagnosis. Hence, this approach might be useful as a screening tool for GD and lead to earlier diagnosis and treatment. Furthermore, advanced ML analytics may highlight previously unrecognized features associated with GD, including clinical diagnoses and health-seeking behaviors.
Plain Language Summary
Diagnosing Gaucher disease is difficult, which often leads to late or incorrect diagnoses. As a result, patients may undergo unnecessary tests and treatments and experience health deterioration despite medications availability for Gaucher disease. In this study, we used electronic health data to develop machine learning models for early diagnosis of Gaucher disease type 1. Our models, which included known Gaucher disease signs and symptoms, previously unknown clinical features, and administrative codes, were able to significantly outperform other models and expert opinions, detecting type 1 Gaucher disease 3 years on average before actual diagnosis. Our models also revealed new features linked to type 1 Gaucher disease, including specific diagnoses and patterns in patients’ healthcare-seeking behaviors. We believe that the tool of machine learning can be valuable for patients with rare diseases.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.