Correctness and Completeness of Breast Cancer Diagnoses Recorded in UK CPRD Aurum and CPRD GOLD Databases: Comparison to Hospital Episode Statistics and Cancer Registry (Companion Paper 2)
Katrina Wilcox Hagberg, Catherine Vasilakis-Scaramozza, Rebecca Persson, David Neasham, George Kafatos, Susan Jick
{"title":"Correctness and Completeness of Breast Cancer Diagnoses Recorded in UK CPRD Aurum and CPRD GOLD Databases: Comparison to Hospital Episode Statistics and Cancer Registry (Companion Paper 2)","authors":"Katrina Wilcox Hagberg, Catherine Vasilakis-Scaramozza, Rebecca Persson, David Neasham, George Kafatos, Susan Jick","doi":"10.2147/clep.s434829","DOIUrl":null,"url":null,"abstract":"<strong>Purpose:</strong> To evaluate the new Clinical Practice Research Datalink (CPRD) Aurum database, we estimated ‘correctness’ (ie accuracy, validity) and ‘completeness’ (ie presence, missingness) of malignant breast cancer diagnoses recorded in CPRD Aurum compared to external linked data sources: Hospital Episode Statistics (HES) Admitted Patient Care (APC), HES Outpatient (OP), and Cancer Registry (CR), and to the previously validated CPRD GOLD.<br/><strong>Methods:</strong> Linkage-eligible, female patients with incident malignant breast cancer diagnosis recorded in at least one study data source were selected. Correctness was the proportion of malignant breast cancer cases recorded in CPRD Aurum or GOLD who also had a diagnosis recorded in HES APC/OP (2004– 2019) or CR (2004– 2016). Completeness was estimated by identifying all malignant breast cancer diagnoses in HES APC/OP or CR and calculating the proportion with a concordant diagnosis in CPRD Aurum or GOLD.<br/><strong>Results:</strong> Compared to HES APC/OP, there were 85,659 and 31,452 eligible patients in CPRD Aurum and GOLD, respectively. Correctness estimates were high (CPRD Aurum 83.5%, GOLD 81.7%). Compared to CR, there were 70,190 and 29,597 eligible patients in CPRD Aurum and GOLD, respectively: correctness was 89.1% for CPRD Aurum and 88.2% for GOLD. Completeness estimates for CPRD Aurum and GOLD were high (> 90%). Diagnoses were recorded in CPRD Aurum within − 7 to 74 days of those in the linked sources. Reasons for discordant diagnostic coding included presence of treatment or other clinical codes only, diagnosis coded after end of follow-up, non-malignant breast cancer in linked data, and administrative codes in lieu of diagnostic codes.<br/><strong>Conclusion:</strong> These results indicate that correctness and completeness of malignant breast cancer diagnoses in CPRD Aurum were high and similar to CPRD GOLD. This provides confidence in use of CPRD Aurum for research purposes. Where complete case capture is important, researchers should consider linkage to HES APC or CR.<br/><br/><strong>Keywords:</strong> CPRD Aurum, CPRD GOLD, breast cancer, validation","PeriodicalId":10362,"journal":{"name":"Clinical Epidemiology","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/clep.s434829","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To evaluate the new Clinical Practice Research Datalink (CPRD) Aurum database, we estimated ‘correctness’ (ie accuracy, validity) and ‘completeness’ (ie presence, missingness) of malignant breast cancer diagnoses recorded in CPRD Aurum compared to external linked data sources: Hospital Episode Statistics (HES) Admitted Patient Care (APC), HES Outpatient (OP), and Cancer Registry (CR), and to the previously validated CPRD GOLD. Methods: Linkage-eligible, female patients with incident malignant breast cancer diagnosis recorded in at least one study data source were selected. Correctness was the proportion of malignant breast cancer cases recorded in CPRD Aurum or GOLD who also had a diagnosis recorded in HES APC/OP (2004– 2019) or CR (2004– 2016). Completeness was estimated by identifying all malignant breast cancer diagnoses in HES APC/OP or CR and calculating the proportion with a concordant diagnosis in CPRD Aurum or GOLD. Results: Compared to HES APC/OP, there were 85,659 and 31,452 eligible patients in CPRD Aurum and GOLD, respectively. Correctness estimates were high (CPRD Aurum 83.5%, GOLD 81.7%). Compared to CR, there were 70,190 and 29,597 eligible patients in CPRD Aurum and GOLD, respectively: correctness was 89.1% for CPRD Aurum and 88.2% for GOLD. Completeness estimates for CPRD Aurum and GOLD were high (> 90%). Diagnoses were recorded in CPRD Aurum within − 7 to 74 days of those in the linked sources. Reasons for discordant diagnostic coding included presence of treatment or other clinical codes only, diagnosis coded after end of follow-up, non-malignant breast cancer in linked data, and administrative codes in lieu of diagnostic codes. Conclusion: These results indicate that correctness and completeness of malignant breast cancer diagnoses in CPRD Aurum were high and similar to CPRD GOLD. This provides confidence in use of CPRD Aurum for research purposes. Where complete case capture is important, researchers should consider linkage to HES APC or CR.
Keywords: CPRD Aurum, CPRD GOLD, breast cancer, validation
期刊介绍:
Clinical Epidemiology is an international, peer reviewed, open access journal. Clinical Epidemiology focuses on the application of epidemiological principles and questions relating to patients and clinical care in terms of prevention, diagnosis, prognosis, and treatment.
Clinical Epidemiology welcomes papers covering these topics in form of original research and systematic reviews.
Clinical Epidemiology has a special interest in international electronic medical patient records and other routine health care data, especially as applied to safety of medical interventions, clinical utility of diagnostic procedures, understanding short- and long-term clinical course of diseases, clinical epidemiological and biostatistical methods, and systematic reviews.
When considering submission of a paper utilizing publicly-available data, authors should ensure that such studies add significantly to the body of knowledge and that they use appropriate validated methods for identifying health outcomes.
The journal has launched special series describing existing data sources for clinical epidemiology, international health care systems and validation studies of algorithms based on databases and registries.