The global rise of non-communicable diseases (NCDs) presents an urgent public health challenge, particularly in regions undergoing rapid economic and demographic transitions. Guangdong Province, China’s most populous and economically advanced region, is experiencing a substantial and accelerating burden of NCDs. However, large-scale, population-based cohorts from this region remain scarce, limiting insights into region-specific disease determinants and prevention strategies. The Guangdong Biobank Cohort (GDBC) was established in 2017, enrolling 35,081 participants aged 40–84 years from urban and rural areas of Zhongshan City in the Pearl River Delta. At baseline, comprehensive data on 346 variables—including lifestyle, environmental exposures, medical histories, physical examinations, and laboratory profiles—were collected via a cloud-based member management information system (MMIS), alongside blood and saliva samples for biobanking. A sub-cohort underwent genome-wide genotyping ( N = 2,530) and oral microbiome profiling via 16 S rRNA sequencing ( N = 2,049). During dynamic follow-up, 44.2% ( N = 15,499) completed Phase I resurvey with repeated measurements and updated biospecimens. Disease outcomes, including hypertension, diabetes, and cancer, were ascertained through active surveillance and regional registry linkage until December 2023. Baseline prevalence of hypertension, diabetes, and cancer was 25.3%, 8.0%, and 3.6%, respectively. Over follow-up, 1,767 hypertension cases, 814 diabetes cases, and 558 cancers were recorded, yielding crude incidence rates of 1,804.6, 649.7, and 423.1 per 100,000 person-years, respectively. The GDBC provides a comprehensive, dynamically updated resource to dissect gene–microbiome–environment interactions and develop precision prevention strategies to inform public health policies.
{"title":"Guangdong Biobank Cohort (GDBC) study","authors":"Yong-Qiao He, Wen-Qiong Xue, Hua Diao, Ji-Yun Zhan, Ming-Fang Ji, Da-Wei Yang, Yi Zhao, Chang-Mi Deng, Zi-Yi Wu, Ting Zhou, Ying Liao, Mei-Qi Zheng, Wen-Li Zhang, Yi-Jing Jia, Lei-Lei Yuan, Lu-Ting Luo, Dan-Hua Li, Tong-Min Wang, Xia-Ting Tong, Yan Du, Ling-Ling Tang, Jing-Wen Huang, Chang-ling Huang, Zhi-Yang Zhao, Yan-Xia Wu, Lian-Jing Cao, Si-Qi Dong, Fang Wang, Cheng-Tao Jiang, Ruo-Wen Xiao, Wen-Bin Zhang, Xue-Yin Chen, Qiao-Ling Wang, Qiao-Yun Liu, Yue-Ze Zhao, Cao-Li Tang, Lin Ma, Xiao-Hui Zheng, Pei-Fen Zhang, Xi-Zhao Li, Shao-Dan Zhang, Ye-Zhu Hu, Xia Yu, Biao-Hua Wu, Fu-Gui Li, Jian-Hua Wu, Bi-Sen Deng, Xue-Jun Liang, Wei-Hua Jia","doi":"10.1007/s10654-025-01320-y","DOIUrl":"https://doi.org/10.1007/s10654-025-01320-y","url":null,"abstract":"The global rise of non-communicable diseases (NCDs) presents an urgent public health challenge, particularly in regions undergoing rapid economic and demographic transitions. Guangdong Province, China’s most populous and economically advanced region, is experiencing a substantial and accelerating burden of NCDs. However, large-scale, population-based cohorts from this region remain scarce, limiting insights into region-specific disease determinants and prevention strategies. The Guangdong Biobank Cohort (GDBC) was established in 2017, enrolling 35,081 participants aged 40–84 years from urban and rural areas of Zhongshan City in the Pearl River Delta. At baseline, comprehensive data on 346 variables—including lifestyle, environmental exposures, medical histories, physical examinations, and laboratory profiles—were collected via a cloud-based member management information system (MMIS), alongside blood and saliva samples for biobanking. A sub-cohort underwent genome-wide genotyping ( <jats:italic>N</jats:italic> = 2,530) and oral microbiome profiling via 16 S rRNA sequencing ( <jats:italic>N</jats:italic> = 2,049). During dynamic follow-up, 44.2% ( <jats:italic>N</jats:italic> = 15,499) completed Phase I resurvey with repeated measurements and updated biospecimens. Disease outcomes, including hypertension, diabetes, and cancer, were ascertained through active surveillance and regional registry linkage until December 2023. Baseline prevalence of hypertension, diabetes, and cancer was 25.3%, 8.0%, and 3.6%, respectively. Over follow-up, 1,767 hypertension cases, 814 diabetes cases, and 558 cancers were recorded, yielding crude incidence rates of 1,804.6, 649.7, and 423.1 per 100,000 person-years, respectively. The GDBC provides a comprehensive, dynamically updated resource to dissect gene–microbiome–environment interactions and develop precision prevention strategies to inform public health policies.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"1 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acute lymphoblastic leukemia (ALL) is the most common childhood malignancy. While space-time clustering of ALL cases has been suggested, only one prior study has examined clustering by genetic subtype. We investigated space-time clustering of childhood ALL in Sweden, both overall and by genetic subtype. The cohort included 1,629 children age 0-18 years diagnosed with ALL between 1992 and 2017, comprising 1,446 B-cell precursor ALL (BCP-ALL) and 183 T-cell ALL (T-ALL) cases. Two BCP-ALL subgroups were analyzed: high hyperdiploidy (HeH, n = 466) and ETV6::RUNX1 (n = 225). The Unbiased Knox Test and Unbiased Combined Knox Test were used to assess space-time clustering at the municipality level, accounting for multiple testing and population shifts. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm was applied to identify significant clusters. Logistic regression was used to evaluate demographic differences between clusters, including age, sex, and birth order. Significant space-time clustering was observed in the HeH subgroup for both place and date of birth (p = 0.005) and place and date of diagnosis (p = 0.011), at space-time thresholds of 40 km/18 months and 30 km/24 months, respectively. No clustering was detected in the overall BCP-ALL group, T-ALL group, or the ETV6::RUNX1 subgroup. Space-time clustering at birth and diagnosis was observed in the HeH subgroup, suggesting potential etiologic heterogeneity in BCP-ALL. These findings support further investigation of environmental and infectious exposures across immunophenotypes and genetic subtypes in larger cohorts.
{"title":"Space-time clustering of childhood high hyperdiploid B-cell precursor acute lymphoblastic leukemia: a nationwide Swedish study.","authors":"Gleb Bychkov,Niklas Engsner,Benedicte Bang,Mats Marshall Heyman,Gisela Barbany,Anna Skarin Nordenvall,Giorgio Tettamanti,Claes Strannegård,Ann Nordgren","doi":"10.1007/s10654-025-01323-9","DOIUrl":"https://doi.org/10.1007/s10654-025-01323-9","url":null,"abstract":"Acute lymphoblastic leukemia (ALL) is the most common childhood malignancy. While space-time clustering of ALL cases has been suggested, only one prior study has examined clustering by genetic subtype. We investigated space-time clustering of childhood ALL in Sweden, both overall and by genetic subtype. The cohort included 1,629 children age 0-18 years diagnosed with ALL between 1992 and 2017, comprising 1,446 B-cell precursor ALL (BCP-ALL) and 183 T-cell ALL (T-ALL) cases. Two BCP-ALL subgroups were analyzed: high hyperdiploidy (HeH, n = 466) and ETV6::RUNX1 (n = 225). The Unbiased Knox Test and Unbiased Combined Knox Test were used to assess space-time clustering at the municipality level, accounting for multiple testing and population shifts. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm was applied to identify significant clusters. Logistic regression was used to evaluate demographic differences between clusters, including age, sex, and birth order. Significant space-time clustering was observed in the HeH subgroup for both place and date of birth (p = 0.005) and place and date of diagnosis (p = 0.011), at space-time thresholds of 40 km/18 months and 30 km/24 months, respectively. No clustering was detected in the overall BCP-ALL group, T-ALL group, or the ETV6::RUNX1 subgroup. Space-time clustering at birth and diagnosis was observed in the HeH subgroup, suggesting potential etiologic heterogeneity in BCP-ALL. These findings support further investigation of environmental and infectious exposures across immunophenotypes and genetic subtypes in larger cohorts.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"48 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Age-related macular degeneration (AMD) is a leading cause of irreversible vision loss worldwide. However, evidence regarding the relationship between air pollution and AMD is limited, and the modifying effect of genetic susceptibility on this association remains unknown. A total of 445,237 participants without AMD at baseline were included from the UK Biobank. The concentrations of nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM2.5, PM10, PM2.5-10) were collected by using land-use regression models. Air pollution score (APS) was constructed through summing each pollutant weighted by the regression coefficients with AMD from single-pollutant model. Cox proportional hazard models were used to evaluate hazard rations (HRs) and 95% confidence intervals (95%CIs) of associations between air pollutants and polygenic risk score (PRS) with incident AMD. During a median follow-up of 13.83 years, we observed 9,635 incident AMD events. The HR (95%CI) of incident AMD for each standard deviation increase in NO2, NOx, PM2.5, PM10, and APS were 1.04(1.02, 1.06), 1.03(1.01, 1.05). 1.04(1.02, 1.07), 1.02(1.00, 1.04), and 1.04(1.02, 1.06), respectively. Significant additive interaction effects of NO2, NOx, PM2.5-10, APS and PRS with incident risk of AMD were observed, with the relative excess risk due to the interaction (RERI), attributable proportion (AP), and their 95% CIs of 0.10(0.01, 0.18) and 0.05(0.01, 0.11) for NO2, 0.11(0.01, 0.19) and 0.05(0.02, 0.10) for NOx, 0.15(0.06, 0.23) and 0.08(0.03, 0.13) for PM2.5-10, and 0.12(0.03, 0.20) and 0.06(0.01, 0.11) for APS, respectively. Compared with participants exposed to low level of above air pollutants and low PRS, those exposed to high air pollution and high PRS had almost double incident risk of AMD [HR(95%CI) ranged from 1.83(1.68, 1.99) to 2.03(1.86, 2.21)]. Long-term exposure to air pollutants NO2, NOx, PM2.5, and PM10 showed positive associations with increased risk of AMD, which could be further enhanced by genetic susceptibility.
{"title":"Air pollutants, genetic susceptibility, and the risk of age-related macular degeneration: a large prospective cohort study.","authors":"Shengli Chen,Gongyue Wang,Xin Guan,Chenming Wang,Yang Xiao,Xingdi Li,Shiru Hong,Yuhan Zhou,Yingqian You,Ye Fu,Yuxi Wang,Yichi Zhang,Hui Zhao,Yingchen Zhang,Yang Cheng,Huan Guo,Huatao Xie","doi":"10.1007/s10654-025-01340-8","DOIUrl":"https://doi.org/10.1007/s10654-025-01340-8","url":null,"abstract":"Age-related macular degeneration (AMD) is a leading cause of irreversible vision loss worldwide. However, evidence regarding the relationship between air pollution and AMD is limited, and the modifying effect of genetic susceptibility on this association remains unknown. A total of 445,237 participants without AMD at baseline were included from the UK Biobank. The concentrations of nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM2.5, PM10, PM2.5-10) were collected by using land-use regression models. Air pollution score (APS) was constructed through summing each pollutant weighted by the regression coefficients with AMD from single-pollutant model. Cox proportional hazard models were used to evaluate hazard rations (HRs) and 95% confidence intervals (95%CIs) of associations between air pollutants and polygenic risk score (PRS) with incident AMD. During a median follow-up of 13.83 years, we observed 9,635 incident AMD events. The HR (95%CI) of incident AMD for each standard deviation increase in NO2, NOx, PM2.5, PM10, and APS were 1.04(1.02, 1.06), 1.03(1.01, 1.05). 1.04(1.02, 1.07), 1.02(1.00, 1.04), and 1.04(1.02, 1.06), respectively. Significant additive interaction effects of NO2, NOx, PM2.5-10, APS and PRS with incident risk of AMD were observed, with the relative excess risk due to the interaction (RERI), attributable proportion (AP), and their 95% CIs of 0.10(0.01, 0.18) and 0.05(0.01, 0.11) for NO2, 0.11(0.01, 0.19) and 0.05(0.02, 0.10) for NOx, 0.15(0.06, 0.23) and 0.08(0.03, 0.13) for PM2.5-10, and 0.12(0.03, 0.20) and 0.06(0.01, 0.11) for APS, respectively. Compared with participants exposed to low level of above air pollutants and low PRS, those exposed to high air pollution and high PRS had almost double incident risk of AMD [HR(95%CI) ranged from 1.83(1.68, 1.99) to 2.03(1.86, 2.21)]. Long-term exposure to air pollutants NO2, NOx, PM2.5, and PM10 showed positive associations with increased risk of AMD, which could be further enhanced by genetic susceptibility.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"48 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01337-3
Catherine Zhou,Antien L Mooyaart,Nikita Hulscher,Thamila Kerkour,Jasper Ouwerkerk,Marieke W J Louwman,Marlies Wakkee,Yunlei Li,Quirinus J M Voorham,Annette Bruggink,Tamar E C Nijsten,Loes M Hollestein
There is a high need for accurate prognostic models among stage II melanoma to determine who may benefit from (neo)adjuvant systemic therapy. The Dutch Early- Stage Melanoma (D-ESMEL) study was designed to identify new prognostic features in a population-based sample of stage I/II melanoma patients in addition to American Joint Committee of Cancer (AJCC) staging. The validation cohort of the D-ESMEL study employs a nested case-control design. Initially, controls were randomly sampled to develop prognostic that included both known and new prognostic factors to assess the additive value of new prognostic factors. As a consequence, most controls had a very thin melanoma (<1.0 mm) while most cases had a thicker melanoma (>2.0 mm). This resulted in insufficient variability and high weights for stage II controls when applying weighted analyses in absolute risk prediction models. Therefore, randomly sampled controls were re-matched on AJCC stage (stage IA, IB, IIA, IIB, IIC), and new stage-matched controls were collected for cases who could not be rematched. The original D-ESMEL validation cohort included 5,815 stage I/II melanoma patients, of whom 154 developed distant metastasis (cases). 98/154 Cases were stage II and only 24 stage II controls were included, while the stage-matched design now includes 153 stage-matched case-control sets of which 97 stage II cases and 97 stage II controls derived from a population-based cohort of 5,785 stage I/II patients. The updated design increased the biological variability among stage II controls, balanced weights in weighted analyses and thereby facilitating reliable subgroup analyses in this clinically important subgroup.
{"title":"An extension of the validation cohort of the Dutch Early-Stage Melanoma (D-ESMEL) study for stage-specific analyses.","authors":"Catherine Zhou,Antien L Mooyaart,Nikita Hulscher,Thamila Kerkour,Jasper Ouwerkerk,Marieke W J Louwman,Marlies Wakkee,Yunlei Li,Quirinus J M Voorham,Annette Bruggink,Tamar E C Nijsten,Loes M Hollestein","doi":"10.1007/s10654-025-01337-3","DOIUrl":"https://doi.org/10.1007/s10654-025-01337-3","url":null,"abstract":"There is a high need for accurate prognostic models among stage II melanoma to determine who may benefit from (neo)adjuvant systemic therapy. The Dutch Early- Stage Melanoma (D-ESMEL) study was designed to identify new prognostic features in a population-based sample of stage I/II melanoma patients in addition to American Joint Committee of Cancer (AJCC) staging. The validation cohort of the D-ESMEL study employs a nested case-control design. Initially, controls were randomly sampled to develop prognostic that included both known and new prognostic factors to assess the additive value of new prognostic factors. As a consequence, most controls had a very thin melanoma (<1.0 mm) while most cases had a thicker melanoma (>2.0 mm). This resulted in insufficient variability and high weights for stage II controls when applying weighted analyses in absolute risk prediction models. Therefore, randomly sampled controls were re-matched on AJCC stage (stage IA, IB, IIA, IIB, IIC), and new stage-matched controls were collected for cases who could not be rematched. The original D-ESMEL validation cohort included 5,815 stage I/II melanoma patients, of whom 154 developed distant metastasis (cases). 98/154 Cases were stage II and only 24 stage II controls were included, while the stage-matched design now includes 153 stage-matched case-control sets of which 97 stage II cases and 97 stage II controls derived from a population-based cohort of 5,785 stage I/II patients. The updated design increased the biological variability among stage II controls, balanced weights in weighted analyses and thereby facilitating reliable subgroup analyses in this clinically important subgroup.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"94 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01336-4
Amalie Helme Simoni,Kathrine Hald,Thure Filskov Overvad,Mette Søgaard,Anne Gulbech Ording
The Danish National Patient Registry (DNPR) and the Danish Cancer Registry (DCR) are central to registry-based cancer research. This systematic review evaluates studies assessing the quality of cancer-related data in these registries under their current data structures. PubMed and Embase were systematically searched on January 24, 2025 (PROSPERO: CRD420251005952). Studies validating cancer-related data in the DNPR or DCR against a gold standard were included. Findings were synthesized narratively and categorized by DNPR data, DCR data, or multi-source algorithms. The literature search generated 915 records, of which 50 were included: 23 validated DNPR data, 9 DCR data, and 18 algorithm performance. The quality of DNPR cancer diagnoses and treatment showed positive predictive values (PPVs) of 57-100%, highest for common malignancies and treatments. The quality of DNPR comorbidities and complications varied substantially (PPVs 0-98%). The PPV of a melanoma diagnosis in the DCR was 97%. The DCR staging completeness varied considerably (34-95%). Algorithms presented PPVs of 60-96% for recurrence, active cancer, and recognized metastases, and 28% for unrecognized metastases. The DNPR and DCR provide high-quality data for many cancer diagnoses, treatments, and outcomes, supporting their use in register-based research. While some data elements, including data on complications, exhibit lower quality, algorithmic approaches can enhance utility for less robust data. However, several aspects of cancer-related data remain unvalidated.
{"title":"Quality of cancer-related data from the Danish National patient registry (1994-2025) and the Danish cancer registry (2004-2025): a systematic review.","authors":"Amalie Helme Simoni,Kathrine Hald,Thure Filskov Overvad,Mette Søgaard,Anne Gulbech Ording","doi":"10.1007/s10654-025-01336-4","DOIUrl":"https://doi.org/10.1007/s10654-025-01336-4","url":null,"abstract":"The Danish National Patient Registry (DNPR) and the Danish Cancer Registry (DCR) are central to registry-based cancer research. This systematic review evaluates studies assessing the quality of cancer-related data in these registries under their current data structures. PubMed and Embase were systematically searched on January 24, 2025 (PROSPERO: CRD420251005952). Studies validating cancer-related data in the DNPR or DCR against a gold standard were included. Findings were synthesized narratively and categorized by DNPR data, DCR data, or multi-source algorithms. The literature search generated 915 records, of which 50 were included: 23 validated DNPR data, 9 DCR data, and 18 algorithm performance. The quality of DNPR cancer diagnoses and treatment showed positive predictive values (PPVs) of 57-100%, highest for common malignancies and treatments. The quality of DNPR comorbidities and complications varied substantially (PPVs 0-98%). The PPV of a melanoma diagnosis in the DCR was 97%. The DCR staging completeness varied considerably (34-95%). Algorithms presented PPVs of 60-96% for recurrence, active cancer, and recognized metastases, and 28% for unrecognized metastases. The DNPR and DCR provide high-quality data for many cancer diagnoses, treatments, and outcomes, supporting their use in register-based research. While some data elements, including data on complications, exhibit lower quality, algorithmic approaches can enhance utility for less robust data. However, several aspects of cancer-related data remain unvalidated.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"82 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01346-2
Magdalena Muszynska-Spielauer,Paola Di Giulio,Yuka Minagawa,Vanessa Di Lego,Marc Luy
This study tests the "longevity hypothesis," which posits that women's greater number of years spent in poor health is primarily a direct consequence of their longer survival. We analyse gender differences in unhealthy life years (ULY) at age 50 across 22 European countries in 2015-2017. ULY was estimated using three approaches-the Sullivan method, the cross-sectional average length of healthy life, and multistate life tables-applied to four health indicators of varying severity: chronic diseases, functional limitations, self-rated health, and disability. Data were drawn from the Human Mortality Database and the Survey of Health, Ageing and Retirement in Europe. We decomposed the gender gap in ULY into a "mortality effect" (ME), reflecting differences in life years lived, and a "health effect" (HE), reflecting differences in morbidity prevalence. Women at age 50 lived more unhealthy years than men across almost all health indicators and countries. In most cases, more than half of the gender gap in ULY was attributable to the ME, indicating that women's longer survival primarily explains their greater number of years spent in poor health. The HE showed greater variation across indicators and countries. Results were most consistent for chronic diseases and self-rated health, while functional limitations and disability yielded smaller and less consistent differences. Findings support the longevity hypothesis: women's higher life expectancy is the main driver of their longer lifetime spent in poor health. The variation across health dimensions highlights the importance of distinguishing between them when studying gender inequalities in health.
{"title":"Why do women live longer than men, but spend more time in poor health? A decomposition analysis of the gender gap in unhealthy life years across Europe.","authors":"Magdalena Muszynska-Spielauer,Paola Di Giulio,Yuka Minagawa,Vanessa Di Lego,Marc Luy","doi":"10.1007/s10654-025-01346-2","DOIUrl":"https://doi.org/10.1007/s10654-025-01346-2","url":null,"abstract":"This study tests the \"longevity hypothesis,\" which posits that women's greater number of years spent in poor health is primarily a direct consequence of their longer survival. We analyse gender differences in unhealthy life years (ULY) at age 50 across 22 European countries in 2015-2017. ULY was estimated using three approaches-the Sullivan method, the cross-sectional average length of healthy life, and multistate life tables-applied to four health indicators of varying severity: chronic diseases, functional limitations, self-rated health, and disability. Data were drawn from the Human Mortality Database and the Survey of Health, Ageing and Retirement in Europe. We decomposed the gender gap in ULY into a \"mortality effect\" (ME), reflecting differences in life years lived, and a \"health effect\" (HE), reflecting differences in morbidity prevalence. Women at age 50 lived more unhealthy years than men across almost all health indicators and countries. In most cases, more than half of the gender gap in ULY was attributable to the ME, indicating that women's longer survival primarily explains their greater number of years spent in poor health. The HE showed greater variation across indicators and countries. Results were most consistent for chronic diseases and self-rated health, while functional limitations and disability yielded smaller and less consistent differences. Findings support the longevity hypothesis: women's higher life expectancy is the main driver of their longer lifetime spent in poor health. The variation across health dimensions highlights the importance of distinguishing between them when studying gender inequalities in health.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"30 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01338-2
Marius Johansen,Tone Kristin Omsland,Katariina Laine,Siri Eldevik Håberg,Maria Christine Magnus
Women with endometriosis have a higher burden of anxiety and depression. Whether they are at increased risk of postpartum depression (PPD) remains unclear. We aimed to compare the risk of PPD between women with and without endometriosis and to explore mediation by previous history of major depression and infertility. In a population-based cohort study, we compared 1,159 singleton pregnancies to women with self-reported endometriosis and 74,590 pregnancies to women without endometriosis. We calculated a djusted risk ratios (aRR) with 95% confidence intervals (CI) using multivariable log-binomial regression, adjusting for age, body mass index, education and income. Mediation analyses assessed the indirect effect of any history of major depression or infertility. Women with endometriosis had a higher risk of PPD (aRR: 1.34, 95% CI: 1.15-1.55). Mediation analyses indicated that a large part of this association was explained by a higher lifetime prevalence of major depression among women with endometriosis (natural direct effect of endometriosis: aRR: 1.17, 95% CI: 1.00-1.36; natural indirect effect of any history of major depression: aRR: 1.14, 95% CI: 1.08-1.20), with 49.3% proportion mediated. Infertility demonstrated a negative natural indirect effect on the association between endometriosis and PPD (aRR: 0.87, 95% CI: 0.81-0.94). Women with endometriosis had an elevated risk of PPD which was largely explained by a higher lifetime prevalence of major depression. Our findings suggest that they constitute a high-risk group and could benefit from closer follow-up to facilitate early identification and intervention.
{"title":"Risk of postpartum depression among women with endometriosis: the Norwegian mother, father and child cohort study (MoBa).","authors":"Marius Johansen,Tone Kristin Omsland,Katariina Laine,Siri Eldevik Håberg,Maria Christine Magnus","doi":"10.1007/s10654-025-01338-2","DOIUrl":"https://doi.org/10.1007/s10654-025-01338-2","url":null,"abstract":"Women with endometriosis have a higher burden of anxiety and depression. Whether they are at increased risk of postpartum depression (PPD) remains unclear. We aimed to compare the risk of PPD between women with and without endometriosis and to explore mediation by previous history of major depression and infertility. In a population-based cohort study, we compared 1,159 singleton pregnancies to women with self-reported endometriosis and 74,590 pregnancies to women without endometriosis. We calculated a djusted risk ratios (aRR) with 95% confidence intervals (CI) using multivariable log-binomial regression, adjusting for age, body mass index, education and income. Mediation analyses assessed the indirect effect of any history of major depression or infertility. Women with endometriosis had a higher risk of PPD (aRR: 1.34, 95% CI: 1.15-1.55). Mediation analyses indicated that a large part of this association was explained by a higher lifetime prevalence of major depression among women with endometriosis (natural direct effect of endometriosis: aRR: 1.17, 95% CI: 1.00-1.36; natural indirect effect of any history of major depression: aRR: 1.14, 95% CI: 1.08-1.20), with 49.3% proportion mediated. Infertility demonstrated a negative natural indirect effect on the association between endometriosis and PPD (aRR: 0.87, 95% CI: 0.81-0.94). Women with endometriosis had an elevated risk of PPD which was largely explained by a higher lifetime prevalence of major depression. Our findings suggest that they constitute a high-risk group and could benefit from closer follow-up to facilitate early identification and intervention.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"24 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01341-7
Kaicheng Wang, Lindsey Rosman, Haidong Lu
Machine learning (ML) algorithms are increasingly used to estimate propensity score with expectation of improving causal inference. However, the validity of data-driven ML-based approaches for confounder selection and adjustment remains unclear. In this study, we emulated the device-stratified secondary analysis of the PARADIGM-HF trial among U.S. veterans with heart failure and implanted cardiac devices from 2016 to 2020. We benchmarked observational estimates from three propensity score approaches against the trial results. (1) logistic regression with pre-specified confounders (2), generalized boosted models (GBM) using the same pre-specified confounders, and (3) GBM with expanded covariates and automated feature selection. Logistic regression-based propensity score approach yielded estimates closest to the trial (HR = 0.93, 95% CI 0.61-1.42; 23-month RR = 0.86, 95% CI 0.57-1.24 vs. trial HR = 0.81, 95% CI 0.61-1.06). Despite better predictive performance, GBM with pre-specified confounders showed no improvement over the logistic regression approach (HR = 0.97, 95% CI 0.68-1.37; RR = 0.96, 95% CI 0.89-1.98). Moreover, GBM with expanded covariates and data-driven automated feature selection substantially increased bias (HR = 0.61, 95% CI 0.30-1.23; RR = 0.69, 95% CI 0.36-1.04). Our findings suggest that ML-based propensity score methods do not inherently improve causal estimation possibly due to residual confounding from omitted or partially adjusted variables and may introduce overadjustment bias when combined with automated feature selection. These results underscore the importance of careful confounder specification and causal reasoning over algorithmic complexity in causal inference.
机器学习(ML)算法越来越多地用于估计倾向得分,期望改善因果推理。然而,数据驱动的基于ml的混杂选择和调整方法的有效性仍然不清楚。在这项研究中,我们模拟了2016年至2020年美国退伍军人心力衰竭和植入心脏装置的PARADIGM-HF试验的器械分层二次分析。我们将三种倾向评分方法的观察性估计与试验结果进行基准比较。(1)预先指定混杂因素的逻辑回归(2),使用相同预先指定混杂因素的广义增强模型(GBM),以及(3)扩展协变量和自动特征选择的GBM。基于Logistic回归的倾向评分方法得出的估计值与试验最接近(HR = 0.93, 95% CI 0.61-1.42; 23个月的RR = 0.86, 95% CI 0.57-1.24,而试验HR = 0.81, 95% CI 0.61-1.06)。尽管具有更好的预测性能,但与逻辑回归方法相比,预先指定混杂因素的GBM没有改善(HR = 0.97, 95% CI 0.68-1.37; RR = 0.96, 95% CI 0.89-1.98)。此外,扩展协变量的GBM和数据驱动的自动特征选择大大增加了偏差(HR = 0.61, 95% CI 0.30-1.23; RR = 0.69, 95% CI 0.36-1.04)。我们的研究结果表明,基于机器学习的倾向评分方法并不能从本质上改善因果估计,这可能是由于遗漏或部分调整变量的残留混淆,并且当与自动特征选择相结合时可能会引入过度调整偏差。这些结果强调了在因果推理中,谨慎的混杂规范和因果推理在算法复杂性上的重要性。
{"title":"Machine learning versus logistic regression for propensity score estimation: a trial emulation benchmarked against the PARADIGM-HF randomized trial.","authors":"Kaicheng Wang, Lindsey Rosman, Haidong Lu","doi":"10.1007/s10654-025-01341-7","DOIUrl":"https://doi.org/10.1007/s10654-025-01341-7","url":null,"abstract":"<p><p>Machine learning (ML) algorithms are increasingly used to estimate propensity score with expectation of improving causal inference. However, the validity of data-driven ML-based approaches for confounder selection and adjustment remains unclear. In this study, we emulated the device-stratified secondary analysis of the PARADIGM-HF trial among U.S. veterans with heart failure and implanted cardiac devices from 2016 to 2020. We benchmarked observational estimates from three propensity score approaches against the trial results. (1) logistic regression with pre-specified confounders (2), generalized boosted models (GBM) using the same pre-specified confounders, and (3) GBM with expanded covariates and automated feature selection. Logistic regression-based propensity score approach yielded estimates closest to the trial (HR = 0.93, 95% CI 0.61-1.42; 23-month RR = 0.86, 95% CI 0.57-1.24 vs. trial HR = 0.81, 95% CI 0.61-1.06). Despite better predictive performance, GBM with pre-specified confounders showed no improvement over the logistic regression approach (HR = 0.97, 95% CI 0.68-1.37; RR = 0.96, 95% CI 0.89-1.98). Moreover, GBM with expanded covariates and data-driven automated feature selection substantially increased bias (HR = 0.61, 95% CI 0.30-1.23; RR = 0.69, 95% CI 0.36-1.04). Our findings suggest that ML-based propensity score methods do not inherently improve causal estimation possibly due to residual confounding from omitted or partially adjusted variables and may introduce overadjustment bias when combined with automated feature selection. These results underscore the importance of careful confounder specification and causal reasoning over algorithmic complexity in causal inference.</p>","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":" ","pages":""},"PeriodicalIF":5.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145951526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01324-8
Yuankai Zhang,Roby Joehanes,Tianxiao Huan,Lukas M Weber,Qiong Yang,Kathryn L Lunetta,Daniel Levy,Chunyu Liu
Mendelian randomization has emerged as a powerful tool for exploring causal relationships in observational studies by using genetic variants as instrumental variables. While multivariable Mendelian randomization extends this approach to simultaneously address multiple exposures, it faces significant challenges with highly correlated exposures, particularly in high-dimensional settings such as multi-omics data. Conventional MVMR methods, which are primarily based on linear regression models, may suffer from multicollinearity and reduced statistical power when analyzing correlated exposures. The increasing availability of high-dimensional multi-omics data has highlighted the limitations of conventional MVMR approaches in analyzing correlated exposures while maintaining biological interpretability. To address these challenges, we propose integrating latent factor analysis into the MVMR framework, enabling dimension reduction without compromising interpretability. Through extensive simulation studies, we demonstrate that our method maintains a well-controlled false positive rate and offers superior sensitivity compared to conventional MVMR approaches. We apply our method to investigate the causal relationship between DNA methylation and mitochondrial DNA copy number. Our method offers a significant advantage in scenarios with highly correlated exposures driven by common latent factors or shared pathways, especially when individual effects are sparse. By applying our method to correlated multi-omics data, we can uncover new insights into the molecular mechanisms underlying complex phenotypes.
{"title":"Integration of latent factor analysis into multivariable Mendelian randomization.","authors":"Yuankai Zhang,Roby Joehanes,Tianxiao Huan,Lukas M Weber,Qiong Yang,Kathryn L Lunetta,Daniel Levy,Chunyu Liu","doi":"10.1007/s10654-025-01324-8","DOIUrl":"https://doi.org/10.1007/s10654-025-01324-8","url":null,"abstract":"Mendelian randomization has emerged as a powerful tool for exploring causal relationships in observational studies by using genetic variants as instrumental variables. While multivariable Mendelian randomization extends this approach to simultaneously address multiple exposures, it faces significant challenges with highly correlated exposures, particularly in high-dimensional settings such as multi-omics data. Conventional MVMR methods, which are primarily based on linear regression models, may suffer from multicollinearity and reduced statistical power when analyzing correlated exposures. The increasing availability of high-dimensional multi-omics data has highlighted the limitations of conventional MVMR approaches in analyzing correlated exposures while maintaining biological interpretability. To address these challenges, we propose integrating latent factor analysis into the MVMR framework, enabling dimension reduction without compromising interpretability. Through extensive simulation studies, we demonstrate that our method maintains a well-controlled false positive rate and offers superior sensitivity compared to conventional MVMR approaches. We apply our method to investigate the causal relationship between DNA methylation and mitochondrial DNA copy number. Our method offers a significant advantage in scenarios with highly correlated exposures driven by common latent factors or shared pathways, especially when individual effects are sparse. By applying our method to correlated multi-omics data, we can uncover new insights into the molecular mechanisms underlying complex phenotypes.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"39 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s10654-025-01335-5
Sif E Carlsen,Emily Jarden,Caroline H Hemmingsen,Lone Schmidt,Sarah Hjorth,Maarit K Leinonen,Ulrika Nörby,Lina S Mørch,Susanne K Kjaer,Hedvig Nordeng,Marie Hargreave
Observational studies have linked maternal hormonal contraception use to childhood cancer risk, but findings are inconsistent. A systematic review was conducted of this potential relationship. A systematic search was performed in PubMed, Embase, Scopus, Cochrane, and Web of Science databases until April 9, 2025. Studies reporting maternal hormonal contraception use before or during pregnancy and childhood cancer risk (0-19 years) were eligible. We included studies providing risk estimates in English or Scandinavian languages. Newcastle-Ottawa Scale was used to assess study quality. Meta-analysis using fixed and random effects was used to pool relative risks (RRs) with 95% confidence intervals (CIs) for childhood cancer according to maternal hormonal contraception use (1) up to or during pregnancy, and (2) exclusively during pregnancy. We included 27 studies (24 case-control and 3 cohort), totaling 11,067 childhood cancer cases. Maternal hormonal contraception use up to and during pregnancy increased risk of any childhood cancer (RR = 1.18; 95% CI = 1.10-1.26), leukemia (RR = 1.24; 95% CI = 1.06-1.45), and lymphoid leukemia (RR = 1.17; 95% CI = 1.06-1.28). Exposures during pregnancy showed higher risk estimate for any cancer (RR = 1.32; 95% CI = 1.12-1.56) and leukemia (RR = 1.63; 95% CI = 1.07-2.49). Most studies were moderate (70%) or high (26%) quality. Maternal hormonal contraception use may increase childhood cancer risk, particularly for leukemia, and during pregnancy. Further prospective studies are needed, focusing on specific hormonal contraception substances and exposure timing.
观察性研究将母亲使用激素避孕与儿童癌症风险联系起来,但研究结果并不一致。对这种潜在的关系进行了系统的回顾。系统检索PubMed, Embase, Scopus, Cochrane和Web of Science数据库,直到2025年4月9日。报告孕妇在怀孕前或怀孕期间使用激素避孕和儿童癌症风险(0-19岁)的研究符合条件。我们纳入了用英语或斯堪的纳维亚语言提供风险评估的研究。采用纽卡斯尔-渥太华量表评估研究质量。采用固定效应和随机效应的荟萃分析,根据(1)怀孕前或怀孕期间以及(2)仅在怀孕期间使用激素避孕,汇总儿童癌症的相对危险度(rr), 95%置信区间(CIs)。我们纳入了27项研究(24项病例对照和3项队列研究),共计11067例儿童癌症病例。孕妇在怀孕前后和怀孕期间使用激素避孕药会增加任何儿童癌症(RR = 1.18; 95% CI = 1.10-1.26)、白血病(RR = 1.24; 95% CI = 1.06-1.45)和淋巴细胞白血病(RR = 1.17; 95% CI = 1.06-1.28)的风险。怀孕期间暴露在暴露环境中,患任何癌症(RR = 1.32; 95% CI = 1.12-1.56)和白血病(RR = 1.63; 95% CI = 1.07-2.49)的风险都较高。大多数研究为中等(70%)或高(26%)质量。孕妇使用激素避孕可能会增加儿童患癌症的风险,尤其是白血病和怀孕期间。需要进一步的前瞻性研究,重点是具体的激素避孕物质和暴露时间。
{"title":"Maternal hormonal contraception use and childhood cancer risk: a systematic review and meta-analysis.","authors":"Sif E Carlsen,Emily Jarden,Caroline H Hemmingsen,Lone Schmidt,Sarah Hjorth,Maarit K Leinonen,Ulrika Nörby,Lina S Mørch,Susanne K Kjaer,Hedvig Nordeng,Marie Hargreave","doi":"10.1007/s10654-025-01335-5","DOIUrl":"https://doi.org/10.1007/s10654-025-01335-5","url":null,"abstract":"Observational studies have linked maternal hormonal contraception use to childhood cancer risk, but findings are inconsistent. A systematic review was conducted of this potential relationship. A systematic search was performed in PubMed, Embase, Scopus, Cochrane, and Web of Science databases until April 9, 2025. Studies reporting maternal hormonal contraception use before or during pregnancy and childhood cancer risk (0-19 years) were eligible. We included studies providing risk estimates in English or Scandinavian languages. Newcastle-Ottawa Scale was used to assess study quality. Meta-analysis using fixed and random effects was used to pool relative risks (RRs) with 95% confidence intervals (CIs) for childhood cancer according to maternal hormonal contraception use (1) up to or during pregnancy, and (2) exclusively during pregnancy. We included 27 studies (24 case-control and 3 cohort), totaling 11,067 childhood cancer cases. Maternal hormonal contraception use up to and during pregnancy increased risk of any childhood cancer (RR = 1.18; 95% CI = 1.10-1.26), leukemia (RR = 1.24; 95% CI = 1.06-1.45), and lymphoid leukemia (RR = 1.17; 95% CI = 1.06-1.28). Exposures during pregnancy showed higher risk estimate for any cancer (RR = 1.32; 95% CI = 1.12-1.56) and leukemia (RR = 1.63; 95% CI = 1.07-2.49). Most studies were moderate (70%) or high (26%) quality. Maternal hormonal contraception use may increase childhood cancer risk, particularly for leukemia, and during pregnancy. Further prospective studies are needed, focusing on specific hormonal contraception substances and exposure timing.","PeriodicalId":11907,"journal":{"name":"European Journal of Epidemiology","volume":"255 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}