Sjoerd van Alten, Benjamin W Domingue, Jessica Faul, Titus Galama, Andries T Marees
Background: Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB.
Methods: Drawing on UK Census data, we constructed a subsample representative of UKB's target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias.
Results: Volunteer bias in all associations, as naively estimated in UKB, was substantial-in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size.
Conclusions: Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.
背景:生物库通常依赖于以志愿者为基础的抽样。这样做的结果是样本量大(功率大),但代表性(偏差大)却是代价。志愿者偏差问题备受争议。在此,我们(i) 表明志愿者偏差会影响英国生物库(UKB)中的关联;(ii) 估计反概率(IP)权重,以纠正英国生物库中的志愿者偏差:根据英国人口普查数据,我们构建了一个代表英国生物库目标人群的子样本,其中包括所有受邀参与的个人。根据英国人口普查和英国广播公司共享的人口统计学变量,我们估算出了每位英国广播公司参与者的 IP 权重 (IPW)。我们比较了这些人口统计学变量之间的 21 个加权和非加权二元关联,以评估志愿者偏差:根据英国调查局的天真估计,所有关联中的志愿者偏差都很大,在某些情况下甚至严重到未加权估计值与目标人群中关联的符号相反。例如,在英国人口普查中,年龄较大的人报告健康状况较好,这与英国人口普查的证据相反。在加权回归中使用 IPW 平均减少了 87% 的志愿者偏差。基于志愿者的抽样大大减少了英国生物库的有效样本量,仅为原来的 32%:结论:大规模生物库的估计值可能会因志愿者偏差而产生误导。我们建议采用 IP 加权法来纠正这种偏差。为了帮助建设下一代生物库,我们就如何在基于志愿者的设计中最好地确保代表性提出了建议。对于英国生物库,IPW 已经可用。
{"title":"Reweighting UK Biobank corrects for pervasive selection bias due to volunteering.","authors":"Sjoerd van Alten, Benjamin W Domingue, Jessica Faul, Titus Galama, Andries T Marees","doi":"10.1093/ije/dyae054","DOIUrl":"10.1093/ije/dyae054","url":null,"abstract":"<p><strong>Background: </strong>Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB.</p><p><strong>Methods: </strong>Drawing on UK Census data, we constructed a subsample representative of UKB's target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias.</p><p><strong>Results: </strong>Volunteer bias in all associations, as naively estimated in UKB, was substantial-in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size.</p><p><strong>Conclusions: </strong>Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":6.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11076923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140876420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhua Yu, Wenzhong Huang, Antonio Gasparrini, Francesco Sera, Alexandra Schneider, Susanne Breitner, Jan Kyselý, Joel Schwartz, Joana Madureira, Vânia Gaio, Yue Leon Guo, Rongbin Xu, Gongbo Chen, Zhengyu Yang, Bo Wen, Yao Wu, Antonella Zanobetti, Haidong Kan, Jiangning Song, Shanshan Li, Yuming Guo
Background: Model-estimated air pollution exposure products have been widely used in epidemiological studies to assess the health risks of particulate matter with diameters of ≤2.5 µm (PM2.5). However, few studies have assessed the disparities in health effects between model-estimated and station-observed PM2.5 exposures.
Methods: We collected daily all-cause, respiratory and cardiovascular mortality data in 347 cities across 15 countries and regions worldwide based on the Multi-City Multi-Country collaborative research network. The station-observed PM2.5 data were obtained from official monitoring stations. The model-estimated global PM2.5 product was developed using a machine-learning approach. The associations between daily exposure to PM2.5 and mortality were evaluated using a two-stage analytical approach.
Results: We included 15.8 million all-cause, 1.5 million respiratory and 4.5 million cardiovascular deaths from 2000 to 2018. Short-term exposure to PM2.5 was associated with a relative risk increase (RRI) of mortality from both station-observed and model-estimated exposures. Every 10-μg/m3 increase in the 2-day moving average PM2.5 was associated with overall RRIs of 0.67% (95% CI: 0.49 to 0.85), 0.68% (95% CI: -0.03 to 1.39) and 0.45% (95% CI: 0.08 to 0.82) for all-cause, respiratory, and cardiovascular mortality based on station-observed PM2.5 and RRIs of 0.87% (95% CI: 0.68 to 1.06), 0.81% (95% CI: 0.08 to 1.55) and 0.71% (95% CI: 0.32 to 1.09) based on model-estimated exposure, respectively.
Conclusions: Mortality risks associated with daily PM2.5 exposure were consistent for both station-observed and model-estimated exposures, suggesting the reliability and potential applicability of the global PM2.5 product in epidemiological studies.
{"title":"Ambient fine particulate matter and daily mortality: a comparative analysis of observed and estimated exposure in 347 cities.","authors":"Wenhua Yu, Wenzhong Huang, Antonio Gasparrini, Francesco Sera, Alexandra Schneider, Susanne Breitner, Jan Kyselý, Joel Schwartz, Joana Madureira, Vânia Gaio, Yue Leon Guo, Rongbin Xu, Gongbo Chen, Zhengyu Yang, Bo Wen, Yao Wu, Antonella Zanobetti, Haidong Kan, Jiangning Song, Shanshan Li, Yuming Guo","doi":"10.1093/ije/dyae066","DOIUrl":"10.1093/ije/dyae066","url":null,"abstract":"<p><strong>Background: </strong>Model-estimated air pollution exposure products have been widely used in epidemiological studies to assess the health risks of particulate matter with diameters of ≤2.5 µm (PM2.5). However, few studies have assessed the disparities in health effects between model-estimated and station-observed PM2.5 exposures.</p><p><strong>Methods: </strong>We collected daily all-cause, respiratory and cardiovascular mortality data in 347 cities across 15 countries and regions worldwide based on the Multi-City Multi-Country collaborative research network. The station-observed PM2.5 data were obtained from official monitoring stations. The model-estimated global PM2.5 product was developed using a machine-learning approach. The associations between daily exposure to PM2.5 and mortality were evaluated using a two-stage analytical approach.</p><p><strong>Results: </strong>We included 15.8 million all-cause, 1.5 million respiratory and 4.5 million cardiovascular deaths from 2000 to 2018. Short-term exposure to PM2.5 was associated with a relative risk increase (RRI) of mortality from both station-observed and model-estimated exposures. Every 10-μg/m3 increase in the 2-day moving average PM2.5 was associated with overall RRIs of 0.67% (95% CI: 0.49 to 0.85), 0.68% (95% CI: -0.03 to 1.39) and 0.45% (95% CI: 0.08 to 0.82) for all-cause, respiratory, and cardiovascular mortality based on station-observed PM2.5 and RRIs of 0.87% (95% CI: 0.68 to 1.06), 0.81% (95% CI: 0.08 to 1.55) and 0.71% (95% CI: 0.32 to 1.09) based on model-estimated exposure, respectively.</p><p><strong>Conclusions: </strong>Mortality risks associated with daily PM2.5 exposure were consistent for both station-observed and model-estimated exposures, suggesting the reliability and potential applicability of the global PM2.5 product in epidemiological studies.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":7.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140897830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marit Næss, Kirsti Kvaløy, Elin P Sørgjerd, Kristin S Sætermo, Lise Norøy, Ann Helen Røstad, Nina Hammer, Trine Govasli Altø, Anne Jorunn Vikdal, Kristian Hveem
{"title":"Data Resource Profile: The HUNT Biobank.","authors":"Marit Næss, Kirsti Kvaløy, Elin P Sørgjerd, Kristin S Sætermo, Lise Norøy, Ann Helen Røstad, Nina Hammer, Trine Govasli Altø, Anne Jorunn Vikdal, Kristian Hveem","doi":"10.1093/ije/dyae073","DOIUrl":"10.1093/ije/dyae073","url":null,"abstract":"","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":6.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11150882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141248062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon R Procter, Proma Paul, Erzsébet Horváth-Puhó, Bronner P Gonçalves
Background: Maternal colonization by the bacterium Group B streptococcus (GBS) increases risk of preterm birth, a condition that has an important impact on the health of children. However, research studies that quantify the effect of GBS colonization on preterm birth have reported variable estimates of the effect measure.
Methods: We performed a simulated cohort study of pregnant women to assess how timing of exposure (GBS colonization) assessment might influence results of studies that address this question. We used published data on longitudinal maternal GBS colonization and on the distribution of preterm births by gestational age to inform parameters used in the simulations.
Results: Assuming that the probability of preterm birth is higher during weeks when pregnant women are colonized by GBS, our results suggest that studies that assess exposure status early during pregnancy are more likely to estimate an association between GBS colonization and preterm birth that is closer to the null, compared with studies that assess exposure either at birth or during gestational weeks matched to preterm births. In sensitivity analyses assuming different colonization acquisition rates and diagnostic sensitivities, we observed similar results.
Conclusions: Accurate quantification of the effect of maternal GBS colonization on the risk of preterm birth is necessary to understand the full health burden linked to this bacterium. In this study, we investigated one possible explanation, related to the timing of exposure assessment, for the variable findings of previous observational studies. Our findings will inform future research on this question.
{"title":"Timing of exposure assessment in studies on Group B streptococcus colonization and preterm birth.","authors":"Simon R Procter, Proma Paul, Erzsébet Horváth-Puhó, Bronner P Gonçalves","doi":"10.1093/ije/dyae076","DOIUrl":"https://doi.org/10.1093/ije/dyae076","url":null,"abstract":"<p><strong>Background: </strong>Maternal colonization by the bacterium Group B streptococcus (GBS) increases risk of preterm birth, a condition that has an important impact on the health of children. However, research studies that quantify the effect of GBS colonization on preterm birth have reported variable estimates of the effect measure.</p><p><strong>Methods: </strong>We performed a simulated cohort study of pregnant women to assess how timing of exposure (GBS colonization) assessment might influence results of studies that address this question. We used published data on longitudinal maternal GBS colonization and on the distribution of preterm births by gestational age to inform parameters used in the simulations.</p><p><strong>Results: </strong>Assuming that the probability of preterm birth is higher during weeks when pregnant women are colonized by GBS, our results suggest that studies that assess exposure status early during pregnancy are more likely to estimate an association between GBS colonization and preterm birth that is closer to the null, compared with studies that assess exposure either at birth or during gestational weeks matched to preterm births. In sensitivity analyses assuming different colonization acquisition rates and diagnostic sensitivities, we observed similar results.</p><p><strong>Conclusions: </strong>Accurate quantification of the effect of maternal GBS colonization on the risk of preterm birth is necessary to understand the full health burden linked to this bacterium. In this study, we investigated one possible explanation, related to the timing of exposure assessment, for the variable findings of previous observational studies. Our findings will inform future research on this question.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":7.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141283680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauren E McCullough, Lindsay J Collin, Muriel Statman
{"title":"Unravelling race inequities in cardiovascular disease mortality among cancer survivors: new insights and future directions.","authors":"Lauren E McCullough, Lindsay J Collin, Muriel Statman","doi":"10.1093/ije/dyae049","DOIUrl":"https://doi.org/10.1093/ije/dyae049","url":null,"abstract":"","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":7.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140862779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emmanouil Bouras, Dipender Gill, Verena Zuber, Neil Murphy, Niki Dimou, Krasimira Aleksandrova, Sarah J Lewis, Richard M Martin, James Yarmolinsky, Demetrius Albanes, Hermann Brenner, Sergi Castellví-Bel, Andrew T Chan, Iona Cheng, Stephen Gruber, Bethany Van Guelpen, Christopher I Li, Loic Le Marchand, Polly A Newcomb, Shuji Ogino, Andrew Pellatt, Stephanie L Schmit, Alicja Wolk, Anna H Wu, Ulrike Peters, Marc J Gunter, Konstantinos K Tsilidis
Background: Colorectal cancer (CRC) is the third-most-common cancer worldwide and its rates are increasing. Elevated body mass index (BMI) is an established risk factor for CRC, although the molecular mechanisms behind this association remain unclear. Using the Mendelian randomization (MR) framework, we aimed to investigate the mediating effects of putative biomarkers and other CRC risk factors in the association between BMI and CRC.
Methods: We selected as mediators biomarkers of established cancer-related mechanisms and other CRC risk factors for which a plausible association with obesity exists, such as inflammatory biomarkers, glucose homeostasis traits, lipids, adipokines, insulin-like growth factor 1 (IGF1), sex hormones, 25-hydroxy-vitamin D, smoking, physical activity (PA) and alcohol consumption. We used inverse-variance weighted MR in the main univariable analyses and performed sensitivity analyses (weighted-median, MR-Egger, Contamination Mixture). We used multivariable MR for the mediation analyses.
Results: Genetically predicted BMI was positively associated with CRC risk [odds ratio per SD (5 kg/m2) = 1.17, 95% CI: 1.08-1.24, P-value = 1.4 × 10-5] and robustly associated with nearly all potential mediators. Genetically predicted IGF1, fasting insulin, low-density lipoprotein cholesterol, smoking, PA and alcohol were associated with CRC risk. Evidence for attenuation was found for IGF1 [explained 7% (95% CI: 2-13%) of the association], smoking (31%, 4-57%) and PA (7%, 2-11%). There was little evidence for pleiotropy, although smoking was bidirectionally associated with BMI and instruments were weak for PA.
Conclusions: The effect of BMI on CRC risk is possibly partly mediated through plasma IGF1, whereas the attenuation of the BMI-CRC association by smoking and PA may reflect confounding and shared underlying mechanisms rather than mediation.
{"title":"Identification of potential mediators of the relationship between body mass index and colorectal cancer: a Mendelian randomization analysis.","authors":"Emmanouil Bouras, Dipender Gill, Verena Zuber, Neil Murphy, Niki Dimou, Krasimira Aleksandrova, Sarah J Lewis, Richard M Martin, James Yarmolinsky, Demetrius Albanes, Hermann Brenner, Sergi Castellví-Bel, Andrew T Chan, Iona Cheng, Stephen Gruber, Bethany Van Guelpen, Christopher I Li, Loic Le Marchand, Polly A Newcomb, Shuji Ogino, Andrew Pellatt, Stephanie L Schmit, Alicja Wolk, Anna H Wu, Ulrike Peters, Marc J Gunter, Konstantinos K Tsilidis","doi":"10.1093/ije/dyae067","DOIUrl":"10.1093/ije/dyae067","url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is the third-most-common cancer worldwide and its rates are increasing. Elevated body mass index (BMI) is an established risk factor for CRC, although the molecular mechanisms behind this association remain unclear. Using the Mendelian randomization (MR) framework, we aimed to investigate the mediating effects of putative biomarkers and other CRC risk factors in the association between BMI and CRC.</p><p><strong>Methods: </strong>We selected as mediators biomarkers of established cancer-related mechanisms and other CRC risk factors for which a plausible association with obesity exists, such as inflammatory biomarkers, glucose homeostasis traits, lipids, adipokines, insulin-like growth factor 1 (IGF1), sex hormones, 25-hydroxy-vitamin D, smoking, physical activity (PA) and alcohol consumption. We used inverse-variance weighted MR in the main univariable analyses and performed sensitivity analyses (weighted-median, MR-Egger, Contamination Mixture). We used multivariable MR for the mediation analyses.</p><p><strong>Results: </strong>Genetically predicted BMI was positively associated with CRC risk [odds ratio per SD (5 kg/m2) = 1.17, 95% CI: 1.08-1.24, P-value = 1.4 × 10-5] and robustly associated with nearly all potential mediators. Genetically predicted IGF1, fasting insulin, low-density lipoprotein cholesterol, smoking, PA and alcohol were associated with CRC risk. Evidence for attenuation was found for IGF1 [explained 7% (95% CI: 2-13%) of the association], smoking (31%, 4-57%) and PA (7%, 2-11%). There was little evidence for pleiotropy, although smoking was bidirectionally associated with BMI and instruments were weak for PA.</p><p><strong>Conclusions: </strong>The effect of BMI on CRC risk is possibly partly mediated through plasma IGF1, whereas the attenuation of the BMI-CRC association by smoking and PA may reflect confounding and shared underlying mechanisms rather than mediation.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":6.4,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082423/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140897860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chiara Sacco, Mattia Manica, Valentina Marziano, Massimo Fabiani, Alberto Mateo-Urdiales, Giorgio Guzzetta, Stefano Merler, Patrizio Pezzotti
Background: Surveillance data and vaccination registries are widely used to provide real-time vaccine effectiveness (VE) estimates, which can be biased due to underreported (i.e. under-ascertained and under-notified) infections. Here, we investigate how the magnitude and direction of this source of bias in retrospective cohort studies vary under different circumstances, including different levels of underreporting, heterogeneities in underreporting across vaccinated and unvaccinated, and different levels of pathogen circulation.
Methods: We developed a stochastic individual-based model simulating the transmission dynamics of a respiratory virus and a large-scale vaccination campaign. Considering a baseline scenario with 22.5% yearly attack rate and 30% reporting ratio, we explored fourteen alternative scenarios, each modifying one or more baseline assumptions. Using synthetic individual-level surveillance data and vaccination registries produced by the model, we estimated the VE against documented infection taking as reference either unvaccinated or recently vaccinated individuals (within 14 days post-administration). Bias was quantified by comparing estimates to the known VE assumed in the model.
Results: VE estimates were accurate when assuming homogeneous reporting ratios, even at low levels (10%), and moderate attack rates (<50%). A substantial downward bias in the estimation arose with homogeneous reporting and attack rates exceeding 50%. Mild heterogeneities in reporting ratios between vaccinated and unvaccinated strongly biased VE estimates, downward if cases in vaccinated were more likely to be reported and upward otherwise, particularly when taking as reference unvaccinated individuals.
Conclusions: In observational studies, high attack rates or differences in underreporting between vaccinated and unvaccinated may result in biased VE estimates. This study underscores the critical importance of monitoring data quality and understanding biases in observational studies, to more adequately inform public health decisions.
{"title":"The impact of underreported infections on vaccine effectiveness estimates derived from retrospective cohort studies.","authors":"Chiara Sacco, Mattia Manica, Valentina Marziano, Massimo Fabiani, Alberto Mateo-Urdiales, Giorgio Guzzetta, Stefano Merler, Patrizio Pezzotti","doi":"10.1093/ije/dyae077","DOIUrl":"10.1093/ije/dyae077","url":null,"abstract":"<p><strong>Background: </strong>Surveillance data and vaccination registries are widely used to provide real-time vaccine effectiveness (VE) estimates, which can be biased due to underreported (i.e. under-ascertained and under-notified) infections. Here, we investigate how the magnitude and direction of this source of bias in retrospective cohort studies vary under different circumstances, including different levels of underreporting, heterogeneities in underreporting across vaccinated and unvaccinated, and different levels of pathogen circulation.</p><p><strong>Methods: </strong>We developed a stochastic individual-based model simulating the transmission dynamics of a respiratory virus and a large-scale vaccination campaign. Considering a baseline scenario with 22.5% yearly attack rate and 30% reporting ratio, we explored fourteen alternative scenarios, each modifying one or more baseline assumptions. Using synthetic individual-level surveillance data and vaccination registries produced by the model, we estimated the VE against documented infection taking as reference either unvaccinated or recently vaccinated individuals (within 14 days post-administration). Bias was quantified by comparing estimates to the known VE assumed in the model.</p><p><strong>Results: </strong>VE estimates were accurate when assuming homogeneous reporting ratios, even at low levels (10%), and moderate attack rates (<50%). A substantial downward bias in the estimation arose with homogeneous reporting and attack rates exceeding 50%. Mild heterogeneities in reporting ratios between vaccinated and unvaccinated strongly biased VE estimates, downward if cases in vaccinated were more likely to be reported and upward otherwise, particularly when taking as reference unvaccinated individuals.</p><p><strong>Conclusions: </strong>In observational studies, high attack rates or differences in underreporting between vaccinated and unvaccinated may result in biased VE estimates. This study underscores the critical importance of monitoring data quality and understanding biases in observational studies, to more adequately inform public health decisions.</p>","PeriodicalId":14147,"journal":{"name":"International journal of epidemiology","volume":"53 3","pages":""},"PeriodicalIF":7.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157963/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141283679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}