Vanessa L. Kronzer , Katrina A. Williamson , Andrew C. Hanson , Jennifer A. Sletten , Jeffrey A. Sparks , John M. Davis III , Cynthia S. Crowson
{"title":"Quantifying and improving rheumatoid arthritis algorithm performance in biobank settings","authors":"Vanessa L. Kronzer , Katrina A. Williamson , Andrew C. Hanson , Jennifer A. Sletten , Jeffrey A. Sparks , John M. Davis III , Cynthia S. Crowson","doi":"10.1016/j.semarthrit.2025.152668","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To quantify and improve the performance of standard rheumatoid arthritis (RA) algorithms in a biobank setting.</div></div><div><h3>Methods</h3><div>This retrospective cohort study within the Mayo Clinic (MC) Biobank and MC Tapestry Study identified RA cases by presence of at least two RA codes OR positive anti-cyclic citrullinated peptide antibodies (CCP) plus disease-modifying anti-rheumatic drug (DMARD) prescription as of 7/18/2022. Rheumatology physicians manually verified all RA cases using RA criteria and/or rheumatology physician diagnosis plus DMARD use. All other biobank participants served as non-RA controls. We defined seropositivity as rheumatoid factor and/or anti-CCP positivity. We assessed rules-based and Electronic Medical Records and Genomics (eMERGE) RA algorithms using positive predictive value (PPV). Finally, we developed a novel RA algorithm using a LASSO-based machine learning approach with five-fold cross validation.</div></div><div><h3>Results</h3><div>We identified 1,316 confirmed RA cases (968 MC Biobank, 348 Tapestry, 70 % seropositive) and 82,123 non-RA controls (mean age 65, 61 % female). The PPV of 3 RA codes was 43 %, codes plus DMARD was 54 %, and codes plus DMARD plus seropositivity was 85 %. The PPV of eMERGE was 77 %. Available in the MC Biobank, self-reported RA (PPV 10 %) only minimally improved algorithm performance (PPV from 83 % to 85 %), whereas family history of RA (PPV 3 %) worsened performance. At 90 % PPV, the novel RA algorithm incorporating key variables such as anti-CCP and DMARD use increased sensitivity by 4–11 % compared to eMERGE.</div></div><div><h3>Conclusion</h3><div>Rules-based and eMERGE RA algorithms had worse performance in biobank than administrative settings. Our novel RA algorithm outperformed these standard algorithms.</div></div>","PeriodicalId":21715,"journal":{"name":"Seminars in arthritis and rheumatism","volume":"72 ","pages":"Article 152668"},"PeriodicalIF":4.6000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seminars in arthritis and rheumatism","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0049017225000393","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To quantify and improve the performance of standard rheumatoid arthritis (RA) algorithms in a biobank setting.
Methods
This retrospective cohort study within the Mayo Clinic (MC) Biobank and MC Tapestry Study identified RA cases by presence of at least two RA codes OR positive anti-cyclic citrullinated peptide antibodies (CCP) plus disease-modifying anti-rheumatic drug (DMARD) prescription as of 7/18/2022. Rheumatology physicians manually verified all RA cases using RA criteria and/or rheumatology physician diagnosis plus DMARD use. All other biobank participants served as non-RA controls. We defined seropositivity as rheumatoid factor and/or anti-CCP positivity. We assessed rules-based and Electronic Medical Records and Genomics (eMERGE) RA algorithms using positive predictive value (PPV). Finally, we developed a novel RA algorithm using a LASSO-based machine learning approach with five-fold cross validation.
Results
We identified 1,316 confirmed RA cases (968 MC Biobank, 348 Tapestry, 70 % seropositive) and 82,123 non-RA controls (mean age 65, 61 % female). The PPV of 3 RA codes was 43 %, codes plus DMARD was 54 %, and codes plus DMARD plus seropositivity was 85 %. The PPV of eMERGE was 77 %. Available in the MC Biobank, self-reported RA (PPV 10 %) only minimally improved algorithm performance (PPV from 83 % to 85 %), whereas family history of RA (PPV 3 %) worsened performance. At 90 % PPV, the novel RA algorithm incorporating key variables such as anti-CCP and DMARD use increased sensitivity by 4–11 % compared to eMERGE.
Conclusion
Rules-based and eMERGE RA algorithms had worse performance in biobank than administrative settings. Our novel RA algorithm outperformed these standard algorithms.
期刊介绍:
Seminars in Arthritis and Rheumatism provides access to the highest-quality clinical, therapeutic and translational research about arthritis, rheumatology and musculoskeletal disorders that affect the joints and connective tissue. Each bimonthly issue includes articles giving you the latest diagnostic criteria, consensus statements, systematic reviews and meta-analyses as well as clinical and translational research studies. Read this journal for the latest groundbreaking research and to gain insights from scientists and clinicians on the management and treatment of musculoskeletal and autoimmune rheumatologic diseases. The journal is of interest to rheumatologists, orthopedic surgeons, internal medicine physicians, immunologists and specialists in bone and mineral metabolism.