英国初级保健电子健康记录中用于检测阵发性夜间血红蛋白尿症（PNH）的机器学习算法。

IF 3.4 2区医学 Q2 GENETICS & HEREDITY Orphanet Journal of Rare Diseases Pub Date : 2024-10-13 DOI:10.1186/s13023-024-03406-4

Amanda Worker, Hadley Mahon, Jack Sams, Freya Boardman-Pretty, Elena Marchini, Rand Dubis, Alan Warren, Jez Stockdale, Jyothika Kumar, Elizabeth Varones, Daniel Ollerenshaw, Calum Grant, Peter Fish, Richard J Kelly

{"title":"英国初级保健电子健康记录中用于检测阵发性夜间血红蛋白尿症（PNH）的机器学习算法。","authors":"Amanda Worker, Hadley Mahon, Jack Sams, Freya Boardman-Pretty, Elena Marchini, Rand Dubis, Alan Warren, Jez Stockdale, Jyothika Kumar, Elizabeth Varones, Daniel Ollerenshaw, Calum Grant, Peter Fish, Richard J Kelly","doi":"10.1186/s13023-024-03406-4","DOIUrl":null,"url":null,"abstract":"Background: Paroxysmal Nocturnal Haemoglobinuria (PNH) is an ultra-rare, acquired disorder that is challenging to diagnose due to varied symptoms, heterogeneous patient presentations, and lack of awareness among healthcare professionals. This leads to frequent misdiagnosis and delays in diagnosis. This study evaluated the feasibility of a machine learning model to identify undiagnosed PNH patients using structured electronic health records.Methods: The study used data from the Optimum Patient Care Research Database, which contains electronic health records from general practitioner (GP) practices across the United Kingdom. PNH patients were identified by the presence, and control patients by the absence of a PNH diagnosis code in their records. Clinical features (symptoms, diagnoses, healthcare utilisation) from 131 patients in the PNH group and 593,838 patients in the control group, were inputted to a tree-based XGBoost machine learning model to classify patients as either \"positive\" or \"negative\" for PNH suspicion. The algorithm was finalised after additional exclusions and inclusions applied. Performance was assessed using positive predictive value (PPV), recall and specificity. As the sample used to develop the algorithm was not representative of the true population prevalence, PPV was additionally adjusted to reflect performance in the wider population.Results: Of all the patients in the PNH group, 27% were classified as positive (recall). 99.99% of the control group were classified as negative (specificity). Of all the patients classified as positive, 60.4% had a diagnosis of PNH in their record (PPV). The PPV adjusted for the population prevalence of PNH was 19.59 suggesting nearly 1 in 5 patients flagged may warrant further PNH investigation. The key clinical features in the model were aplastic anaemia, pancytopenia, haemolytic anaemia, myelodysplastic syndrome, and Budd-Chiari syndrome.Conclusion: This is the first study to combine clinical understanding of PNH with machine learning, demonstrating the ability to discriminate between PNH and control patients in retrospective electronic health records. With further investigation and validation, this algorithm could be deployed on live health data, potentially leading to earlier diagnosis for patients who currently experience long diagnostic delays or remain undiagnosed.","PeriodicalId":19651,"journal":{"name":"Orphanet Journal of Rare Diseases","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479535/pdf/","citationCount":"0","resultStr":"{\"title\":\"A machine learning algorithm for the detection of paroxysmal nocturnal haemoglobinuria (PNH) in UK primary care electronic health records.\",\"authors\":\"Amanda Worker, Hadley Mahon, Jack Sams, Freya Boardman-Pretty, Elena Marchini, Rand Dubis, Alan Warren, Jez Stockdale, Jyothika Kumar, Elizabeth Varones, Daniel Ollerenshaw, Calum Grant, Peter Fish, Richard J Kelly\",\"doi\":\"10.1186/s13023-024-03406-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Paroxysmal Nocturnal Haemoglobinuria (PNH) is an ultra-rare, acquired disorder that is challenging to diagnose due to varied symptoms, heterogeneous patient presentations, and lack of awareness among healthcare professionals. This leads to frequent misdiagnosis and delays in diagnosis. This study evaluated the feasibility of a machine learning model to identify undiagnosed PNH patients using structured electronic health records.Methods: The study used data from the Optimum Patient Care Research Database, which contains electronic health records from general practitioner (GP) practices across the United Kingdom. PNH patients were identified by the presence, and control patients by the absence of a PNH diagnosis code in their records. Clinical features (symptoms, diagnoses, healthcare utilisation) from 131 patients in the PNH group and 593,838 patients in the control group, were inputted to a tree-based XGBoost machine learning model to classify patients as either \\\"positive\\\" or \\\"negative\\\" for PNH suspicion. The algorithm was finalised after additional exclusions and inclusions applied. Performance was assessed using positive predictive value (PPV), recall and specificity. As the sample used to develop the algorithm was not representative of the true population prevalence, PPV was additionally adjusted to reflect performance in the wider population.Results: Of all the patients in the PNH group, 27% were classified as positive (recall). 99.99% of the control group were classified as negative (specificity). Of all the patients classified as positive, 60.4% had a diagnosis of PNH in their record (PPV). The PPV adjusted for the population prevalence of PNH was 19.59 suggesting nearly 1 in 5 patients flagged may warrant further PNH investigation. The key clinical features in the model were aplastic anaemia, pancytopenia, haemolytic anaemia, myelodysplastic syndrome, and Budd-Chiari syndrome.Conclusion: This is the first study to combine clinical understanding of PNH with machine learning, demonstrating the ability to discriminate between PNH and control patients in retrospective electronic health records. With further investigation and validation, this algorithm could be deployed on live health data, potentially leading to earlier diagnosis for patients who currently experience long diagnostic delays or remain undiagnosed.\",\"PeriodicalId\":19651,\"journal\":{\"name\":\"Orphanet Journal of Rare Diseases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479535/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Orphanet Journal of Rare Diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13023-024-03406-4\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orphanet Journal of Rare Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13023-024-03406-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

背景：阵发性夜间血红蛋白尿症（PNH）是一种极其罕见的后天性疾病，由于症状多样、患者表现各异以及医护人员缺乏认识，诊断起来非常困难。这导致了频繁的误诊和诊断延误。本研究评估了利用结构化电子病历识别未确诊 PNH 患者的机器学习模型的可行性：该研究使用了最佳患者护理研究数据库中的数据，该数据库包含英国各地全科医生（GP）诊所的电子健康记录。PNH患者的识别标准是其病历中是否有PNH诊断代码，对照组患者的识别标准是其病历中是否没有PNH诊断代码。将 131 名 PNH 组患者和 593,838 名对照组患者的临床特征（症状、诊断、医疗保健使用情况）输入基于树的 XGBoost 机器学习模型，将患者分为 PNH 怀疑 "阳性 "或 "阴性"。在应用了额外的排除和纳入后，最终确定了算法。使用阳性预测值 (PPV)、召回率和特异性评估算法的性能。由于用于开发算法的样本不能代表真实的人群患病率，因此对 PPV 进行了额外调整，以反映在更广泛人群中的表现：在 PNH 组的所有患者中，有 27% 被归类为阳性（召回）。对照组 99.99% 的患者被归类为阴性（特异性）。在所有被分类为阳性的患者中，60.4%的患者在病历中被诊断为 PNH（PPV）。根据 PNH 患病率调整后的 PPV 为 19.59，这表明每 5 名被标记的患者中就有近 1 人需要进一步检查 PNH。该模型的主要临床特征是再生障碍性贫血、泛发性贫血、溶血性贫血、骨髓增生异常综合征和巴德-卡氏综合征：这是第一项将对 PNH 的临床理解与机器学习相结合的研究，证明了在回顾性电子健康记录中区分 PNH 和对照组患者的能力。通过进一步的研究和验证，该算法可应用于实时健康数据，从而有可能为目前诊断延误时间较长或仍未确诊的患者提供更早的诊断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A machine learning algorithm for the detection of paroxysmal nocturnal haemoglobinuria (PNH) in UK primary care electronic health records.

Background: Paroxysmal Nocturnal Haemoglobinuria (PNH) is an ultra-rare, acquired disorder that is challenging to diagnose due to varied symptoms, heterogeneous patient presentations, and lack of awareness among healthcare professionals. This leads to frequent misdiagnosis and delays in diagnosis. This study evaluated the feasibility of a machine learning model to identify undiagnosed PNH patients using structured electronic health records.

Methods: The study used data from the Optimum Patient Care Research Database, which contains electronic health records from general practitioner (GP) practices across the United Kingdom. PNH patients were identified by the presence, and control patients by the absence of a PNH diagnosis code in their records. Clinical features (symptoms, diagnoses, healthcare utilisation) from 131 patients in the PNH group and 593,838 patients in the control group, were inputted to a tree-based XGBoost machine learning model to classify patients as either "positive" or "negative" for PNH suspicion. The algorithm was finalised after additional exclusions and inclusions applied. Performance was assessed using positive predictive value (PPV), recall and specificity. As the sample used to develop the algorithm was not representative of the true population prevalence, PPV was additionally adjusted to reflect performance in the wider population.

Results: Of all the patients in the PNH group, 27% were classified as positive (recall). 99.99% of the control group were classified as negative (specificity). Of all the patients classified as positive, 60.4% had a diagnosis of PNH in their record (PPV). The PPV adjusted for the population prevalence of PNH was 19.59 suggesting nearly 1 in 5 patients flagged may warrant further PNH investigation. The key clinical features in the model were aplastic anaemia, pancytopenia, haemolytic anaemia, myelodysplastic syndrome, and Budd-Chiari syndrome.

Conclusion: This is the first study to combine clinical understanding of PNH with machine learning, demonstrating the ability to discriminate between PNH and control patients in retrospective electronic health records. With further investigation and validation, this algorithm could be deployed on live health data, potentially leading to earlier diagnosis for patients who currently experience long diagnostic delays or remain undiagnosed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Orphanet Journal of Rare Diseases 医学-医学：研究与实验

CiteScore

6.30

自引率

8.10%

发文量

418

审稿时长

4-8 weeks

期刊介绍： Orphanet Journal of Rare Diseases is an open access, peer-reviewed journal that encompasses all aspects of rare diseases and orphan drugs. The journal publishes high-quality reviews on specific rare diseases. In addition, the journal may consider articles on clinical trial outcome reports, either positive or negative, and articles on public health issues in the field of rare diseases and orphan drugs. The journal does not accept case reports.