Sjoerd van Alten, Benjamin W Domingue, Jessica Faul, Titus Galama, Andries T Marees
{"title":"Reweighting UK Biobank corrects for pervasive selection bias due to volunteering.","authors":"Sjoerd van Alten, Benjamin W Domingue, Jessica Faul, Titus Galama, Andries T Marees","doi":"10.1093/ije/dyae054","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB.</p><p><strong>Methods: </strong>Drawing on UK Census data, we constructed a subsample representative of UKB's target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias.</p><p><strong>Results: </strong>Volunteer bias in all associations, as naively estimated in UKB, was substantial-in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size.</p><p><strong>Conclusions: </strong>Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076923/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ije/dyae054","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB.
Methods: Drawing on UK Census data, we constructed a subsample representative of UKB's target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias.
Results: Volunteer bias in all associations, as naively estimated in UKB, was substantial-in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size.
Conclusions: Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.
期刊介绍:
The International Journal of Epidemiology is a vital resource for individuals seeking to stay updated on the latest advancements and emerging trends in the field of epidemiology worldwide.
The journal fosters communication among researchers, educators, and practitioners involved in the study, teaching, and application of epidemiology pertaining to both communicable and non-communicable diseases. It also includes research on health services and medical care.
Furthermore, the journal presents new methodologies in epidemiology and statistics, catering to professionals working in social and preventive medicine. Published six times a year, the International Journal of Epidemiology provides a comprehensive platform for the analysis of data.
Overall, this journal is an indispensable tool for staying informed and connected within the dynamic realm of epidemiology.