Pub Date : 2025-10-01Epub Date: 2025-07-16DOI: 10.1016/j.annepidem.2025.07.017
Yafei Wu, Harry Qin, Shengnan Wang, Qingling Yang, Yan Zhang, Harry Haoxiang Wang, Yao Jie Xie
Purpose: To identify age-specific predictors of migraine prevalence among Chinese women.
Methods: In this cross-sectional analysis, 54 predictors were collected from the MECH-HK cohort. Migraine was assessed by the ICHD 3rd edition. Machine learning was employed to select a streamlined subset of predictors. Participants were categorised as young and middle age group (<60 years) and old age group (≥60 years) for analysis.
Results: The mean age of participants was 54.3 years. Migraine prevalence was higher in women under 60 than in older women (10.7 % vs. 6.0 %, P < 0.001). Lasso selected seven (<60 years) and twelve (≥60 years) predictors, respectively. The top three predictors among women under 60 were fatigue, migraine family history, and PSQI, explaining 6.6 %, 5.0 %, and 4.9 % of variation, respectively. Their ORs (95 % CIs) were 1.61 (1.37-1.89), 3.93 (2.77-5.57), and 1.29 (1.12-1.48), respectively. For older women, the top three predictors were experience of hunger, smartphone usage time, and migraine family history, explaining 2.0 %, 1.8 %, and 1.6 % of variation, respectively, with ORs (95 % CIs) of 2.16 (1.21-3.84), 1.24 (1.03-1.48), and 2.26 (1.16-4.40), respectively.
Conclusion: Migraine family history and experience of hunger were shared predictors for migraine prevalence in both ages. Other predictors differentially influence migraine prevalence across ages.
{"title":"Predictors of migraine prevalence among different age groups in Hong Kong Chinese women: Machine learning analyses on the MECH-HK cohort.","authors":"Yafei Wu, Harry Qin, Shengnan Wang, Qingling Yang, Yan Zhang, Harry Haoxiang Wang, Yao Jie Xie","doi":"10.1016/j.annepidem.2025.07.017","DOIUrl":"10.1016/j.annepidem.2025.07.017","url":null,"abstract":"<p><strong>Purpose: </strong>To identify age-specific predictors of migraine prevalence among Chinese women.</p><p><strong>Methods: </strong>In this cross-sectional analysis, 54 predictors were collected from the MECH-HK cohort. Migraine was assessed by the ICHD 3rd edition. Machine learning was employed to select a streamlined subset of predictors. Participants were categorised as young and middle age group (<60 years) and old age group (≥60 years) for analysis.</p><p><strong>Results: </strong>The mean age of participants was 54.3 years. Migraine prevalence was higher in women under 60 than in older women (10.7 % vs. 6.0 %, P < 0.001). Lasso selected seven (<60 years) and twelve (≥60 years) predictors, respectively. The top three predictors among women under 60 were fatigue, migraine family history, and PSQI, explaining 6.6 %, 5.0 %, and 4.9 % of variation, respectively. Their ORs (95 % CIs) were 1.61 (1.37-1.89), 3.93 (2.77-5.57), and 1.29 (1.12-1.48), respectively. For older women, the top three predictors were experience of hunger, smartphone usage time, and migraine family history, explaining 2.0 %, 1.8 %, and 1.6 % of variation, respectively, with ORs (95 % CIs) of 2.16 (1.21-3.84), 1.24 (1.03-1.48), and 2.26 (1.16-4.40), respectively.</p><p><strong>Conclusion: </strong>Migraine family history and experience of hunger were shared predictors for migraine prevalence in both ages. Other predictors differentially influence migraine prevalence across ages.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"34-42"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-13DOI: 10.1016/j.annepidem.2025.07.002
Judith J M Rijnhart, Ryan J Bailey, Jessica Agbodo, Vishakha Agrawal, Valerie M Rodriguez-Olmo, Jason L Salemi
Purpose: To describe three statistical approaches that help gain a comprehensive understanding of mechanisms underlying health inequities: univariate regression analysis, effect modification analysis, and mediation analysis.
Methods: We described how univariate regression analysis, effect modification analysis, and mediation analysis can be used to gain insight into mechanisms underlying health inequities. We demonstrated the application of these approaches using a motivating example from the Health and Retirement Study in which we studied the role of education in ethnic disparities in episodic memory.
Results: Univariate regression analysis showed that Hispanic individuals on average had lower episodic memory scores compared to non-Hispanic individuals. Effect modification analysis showed that the beneficial effect of education on episodic memory was less strong in Hispanic individuals compared to non-Hispanic individuals. Mediation analysis showed that the ethnic disparity in episodic memory was not only driven by effect modification, but also by differences in the distribution of education years across ethnic groups.
Conclusion: The combined study of effect modification and mediation provides a comprehensive understanding of the mechanisms that cause and sustain health inequities. Insight into these mechanisms is crucial to determine targets for interventions and policies aimed at eliminating health inequities.
{"title":"Leveraging mediation analysis as a tool to study mechanisms underlying health inequities.","authors":"Judith J M Rijnhart, Ryan J Bailey, Jessica Agbodo, Vishakha Agrawal, Valerie M Rodriguez-Olmo, Jason L Salemi","doi":"10.1016/j.annepidem.2025.07.002","DOIUrl":"10.1016/j.annepidem.2025.07.002","url":null,"abstract":"<p><strong>Purpose: </strong>To describe three statistical approaches that help gain a comprehensive understanding of mechanisms underlying health inequities: univariate regression analysis, effect modification analysis, and mediation analysis.</p><p><strong>Methods: </strong>We described how univariate regression analysis, effect modification analysis, and mediation analysis can be used to gain insight into mechanisms underlying health inequities. We demonstrated the application of these approaches using a motivating example from the Health and Retirement Study in which we studied the role of education in ethnic disparities in episodic memory.</p><p><strong>Results: </strong>Univariate regression analysis showed that Hispanic individuals on average had lower episodic memory scores compared to non-Hispanic individuals. Effect modification analysis showed that the beneficial effect of education on episodic memory was less strong in Hispanic individuals compared to non-Hispanic individuals. Mediation analysis showed that the ethnic disparity in episodic memory was not only driven by effect modification, but also by differences in the distribution of education years across ethnic groups.</p><p><strong>Conclusion: </strong>The combined study of effect modification and mediation provides a comprehensive understanding of the mechanisms that cause and sustain health inequities. Insight into these mechanisms is crucial to determine targets for interventions and policies aimed at eliminating health inequities.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"1-6"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144644079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-22DOI: 10.1016/j.annepidem.2025.07.021
Alina Schnake-Mahl, Ana V Diez Roux, Bian Liu, Louisa W Holaday, Albert Siu, Edwin McCulley, Usama Bilal, Katherine A Ornstein
Purpose: Both hospitals and neighborhoods likely play important roles in driving health outcomes and inequities, but there has been limited prior research examining both contexts simultaneously. In this analysis we examine the contributions of these two critical contexts, neighborhoods and hospitals, to variation in in-hospital mortality and mortality disparities.
Methods: We used cross-classified multi-level models, a statistical technique that can incorporate data from multiple non-nested levels, to examine the variation in contribution of neighborhoods and hospitals to in-hospital mortality. Our study focuses on COVID-19 in hospital mortality from New York State in 2020, as a methodological case study of cross classified multilevel modeling, given the well documented variation in COVID-19 in-hospital mortality across contexts.
Results: We found that nearly one in five patients hospitalized for COVID-19 died, and there was substantial variation in risk of in-hospital mortality by neighborhoods and hospitals, with more variation across hospitals (τ00:0.29) than across neighborhoods (τ00:0.02). Neighborhoods did not explain hospital variability and vice versa: both contexts appeared to contribute independently to in-hospital mortality rates. We also found several hospital, neighborhood, and individual factors were associated with in hospital mortality disparities in fully adjusted models: lower hospital quality and safety-net hospitals, social vulnerability, older age, not having private insurance, and being Hispanic or non-Hispanic other.
Conclusions: Our findings suggest the importance of simultaneously considering hospital and neighborhood contexts to understand in-hospital outcome disparities. Understanding the contribution of these critical contexts has important implications for targeting interventions to ensure equitable hospital outcomes despite inequities in neighborhood and hospital contexts.
{"title":"Where you live and where you receive care: Using cross-classified multilevel modeling to examine hospital and neighborhood variation in in-hospital mortality and mortality disparities.","authors":"Alina Schnake-Mahl, Ana V Diez Roux, Bian Liu, Louisa W Holaday, Albert Siu, Edwin McCulley, Usama Bilal, Katherine A Ornstein","doi":"10.1016/j.annepidem.2025.07.021","DOIUrl":"10.1016/j.annepidem.2025.07.021","url":null,"abstract":"<p><strong>Purpose: </strong>Both hospitals and neighborhoods likely play important roles in driving health outcomes and inequities, but there has been limited prior research examining both contexts simultaneously. In this analysis we examine the contributions of these two critical contexts, neighborhoods and hospitals, to variation in in-hospital mortality and mortality disparities.</p><p><strong>Methods: </strong>We used cross-classified multi-level models, a statistical technique that can incorporate data from multiple non-nested levels, to examine the variation in contribution of neighborhoods and hospitals to in-hospital mortality. Our study focuses on COVID-19 in hospital mortality from New York State in 2020, as a methodological case study of cross classified multilevel modeling, given the well documented variation in COVID-19 in-hospital mortality across contexts.</p><p><strong>Results: </strong>We found that nearly one in five patients hospitalized for COVID-19 died, and there was substantial variation in risk of in-hospital mortality by neighborhoods and hospitals, with more variation across hospitals (τ<sub>00</sub>:0.29) than across neighborhoods (τ<sub>00</sub>:0.02). Neighborhoods did not explain hospital variability and vice versa: both contexts appeared to contribute independently to in-hospital mortality rates. We also found several hospital, neighborhood, and individual factors were associated with in hospital mortality disparities in fully adjusted models: lower hospital quality and safety-net hospitals, social vulnerability, older age, not having private insurance, and being Hispanic or non-Hispanic other.</p><p><strong>Conclusions: </strong>Our findings suggest the importance of simultaneously considering hospital and neighborhood contexts to understand in-hospital outcome disparities. Understanding the contribution of these critical contexts has important implications for targeting interventions to ensure equitable hospital outcomes despite inequities in neighborhood and hospital contexts.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"16-22"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12750337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144709758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-16DOI: 10.1016/j.annepidem.2025.07.022
Christine M Forke, Laura G Barr, Laura Sinko, Melissa E Dichter, Peter F Cronholm
Purpose: To add to existing knowledge on relationships between Conventionally-identified Adverse Childhood Experiences (ACEs) and adolescent reproductive health (ARH) outcomes, we identified contributions of Expanded (community-level) ACEs, integrating measures of ACE co-occurrence and burden.
Methods: Secondary analysis of 2012-2013 Philadelphia ACEs data from a population-based adult sample. Weighted regressions, adjusted for age, sex, race/ethnicity, and socioeconomic status, tested associations between Conventional and Expanded ACEs (separately and co-occurring) and ACE burden (lowest to highest exposure) with: early sexarche (<15 years), adolescent pregnancy (<19 years), and unintended adolescent pregnancy.
Results: Conventional ACEs showed strong dose-response relationships with all outcomes (aOR range: 2.04-4.96, p < 0.05). Expanded ACEs were associated with early sexarche (aOR=2.50; 95 % CI: 1.27, 4.94), adolescent pregnancy (aOR=1.69; 95 % CI: 1.16, 2.46), and unintended adolescent pregnancy (aOR=1.54; 95 % CI: 1.04, 2.29); dose-response patterns were inconsistent. Co-occurring Conventional and Expanded ACEs produced the greatest odds for all outcomes except early sexarche (aOR range: 3.20-14.97, p < 0.05).
Conclusions: Conventional and Expanded ACEs are important independently and jointly. ARH outcomes peaked when Conventional and Expanded ACEs co-occurred and both exposures were high. Results suggest that Conventional ACEs may be overestimated when assessed in isolation, highlighting the importance of considering Expanded ACEs to minimize bias and target appropriate interventions.
{"title":"Adverse childhood experiences (ACEs) and adolescent reproductive health: Differentiating household and community adversity.","authors":"Christine M Forke, Laura G Barr, Laura Sinko, Melissa E Dichter, Peter F Cronholm","doi":"10.1016/j.annepidem.2025.07.022","DOIUrl":"10.1016/j.annepidem.2025.07.022","url":null,"abstract":"<p><strong>Purpose: </strong>To add to existing knowledge on relationships between Conventionally-identified Adverse Childhood Experiences (ACEs) and adolescent reproductive health (ARH) outcomes, we identified contributions of Expanded (community-level) ACEs, integrating measures of ACE co-occurrence and burden.</p><p><strong>Methods: </strong>Secondary analysis of 2012-2013 Philadelphia ACEs data from a population-based adult sample. Weighted regressions, adjusted for age, sex, race/ethnicity, and socioeconomic status, tested associations between Conventional and Expanded ACEs (separately and co-occurring) and ACE burden (lowest to highest exposure) with: early sexarche (<15 years), adolescent pregnancy (<19 years), and unintended adolescent pregnancy.</p><p><strong>Results: </strong>Conventional ACEs showed strong dose-response relationships with all outcomes (aOR range: 2.04-4.96, p < 0.05). Expanded ACEs were associated with early sexarche (aOR=2.50; 95 % CI: 1.27, 4.94), adolescent pregnancy (aOR=1.69; 95 % CI: 1.16, 2.46), and unintended adolescent pregnancy (aOR=1.54; 95 % CI: 1.04, 2.29); dose-response patterns were inconsistent. Co-occurring Conventional and Expanded ACEs produced the greatest odds for all outcomes except early sexarche (aOR range: 3.20-14.97, p < 0.05).</p><p><strong>Conclusions: </strong>Conventional and Expanded ACEs are important independently and jointly. ARH outcomes peaked when Conventional and Expanded ACEs co-occurred and both exposures were high. Results suggest that Conventional ACEs may be overestimated when assessed in isolation, highlighting the importance of considering Expanded ACEs to minimize bias and target appropriate interventions.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"7-15"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-22DOI: 10.1016/j.annepidem.2025.07.025
Mathias Ausserwinkler, Maria Flamm, Sophie Gensluckner, Kathrin Bogensberger, Bernhard Paulweber, Eugen Trinka, Patrick Langthaler, Christian Datz, Boris Lindner, Bernhard Iglseder, Elmar Aigner, Bernhard Wernly
Introduction: Austria, a country with a high standard of living and a well-developed healthcare system, still experiences socioeconomic status (SES) disparities that impact health outcomes. Rheumatoid arthritis (RA) is a chronic autoimmune disease associated with significant disability and comorbidities. While SES has been linked to RA prevalence and disease severity, its role in a high-income country like Austria remains underexplored. This study investigates the association between SES factors-education, income, employment status and migration background-and RA prevalence and outcomes.
Methods: This population-based study used data from the Paracelsus 10,000 cohort in Salzburg, Austria and a cross-sectional design. A total of 9256 participants aged 40-77 years were analyzed, including 289 individuals diagnosed with RA based on the ACR/EULAR classification criteria. SES was assessed through self-reported education, income, employment status and country of birth. Logistic regression models were used to evaluate the association between SES and RA, adjusting for age, sex, metabolic syndrome, smoking and alcohol consumption.
Results: RA prevalence was significantly lower among individuals with higher education (OR = 0.55, 95 % CI: 0.37-0.82 for medium education; OR = 0.41, 95 % CI: 0.25-0.68 for high education). Lower household income correlated with higher RA prevalence. Employment disparities were evident, with RA patients exhibiting higher rates of unemployment and work disability.
Conclusion: Despite Austria's high standard of living, SES remains a key determinant of RA prevalence. Lower levels of education, income and employment are associated with higher rates of RA, highlighting the need for targeted public health interventions. Strengthening healthcare access, promoting early screening and offering economic support to vulnerable groups could be important steps toward reducing these disparities. Further research should explore the underlying mechanisms of this association and examine whether socioeconomic disparities also influence disease progression and patient outcomes.
{"title":"Exploring the link between socioeconomic factors and rheumatoid arthritis: Insights from a large Austrian study.","authors":"Mathias Ausserwinkler, Maria Flamm, Sophie Gensluckner, Kathrin Bogensberger, Bernhard Paulweber, Eugen Trinka, Patrick Langthaler, Christian Datz, Boris Lindner, Bernhard Iglseder, Elmar Aigner, Bernhard Wernly","doi":"10.1016/j.annepidem.2025.07.025","DOIUrl":"10.1016/j.annepidem.2025.07.025","url":null,"abstract":"<p><strong>Introduction: </strong>Austria, a country with a high standard of living and a well-developed healthcare system, still experiences socioeconomic status (SES) disparities that impact health outcomes. Rheumatoid arthritis (RA) is a chronic autoimmune disease associated with significant disability and comorbidities. While SES has been linked to RA prevalence and disease severity, its role in a high-income country like Austria remains underexplored. This study investigates the association between SES factors-education, income, employment status and migration background-and RA prevalence and outcomes.</p><p><strong>Methods: </strong>This population-based study used data from the Paracelsus 10,000 cohort in Salzburg, Austria and a cross-sectional design. A total of 9256 participants aged 40-77 years were analyzed, including 289 individuals diagnosed with RA based on the ACR/EULAR classification criteria. SES was assessed through self-reported education, income, employment status and country of birth. Logistic regression models were used to evaluate the association between SES and RA, adjusting for age, sex, metabolic syndrome, smoking and alcohol consumption.</p><p><strong>Results: </strong>RA prevalence was significantly lower among individuals with higher education (OR = 0.55, 95 % CI: 0.37-0.82 for medium education; OR = 0.41, 95 % CI: 0.25-0.68 for high education). Lower household income correlated with higher RA prevalence. Employment disparities were evident, with RA patients exhibiting higher rates of unemployment and work disability.</p><p><strong>Conclusion: </strong>Despite Austria's high standard of living, SES remains a key determinant of RA prevalence. Lower levels of education, income and employment are associated with higher rates of RA, highlighting the need for targeted public health interventions. Strengthening healthcare access, promoting early screening and offering economic support to vulnerable groups could be important steps toward reducing these disparities. Further research should explore the underlying mechanisms of this association and examine whether socioeconomic disparities also influence disease progression and patient outcomes.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"66-71"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144709757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-26DOI: 10.1016/j.annepidem.2025.07.060
L M de Groot, J W R Twisk, A A L Kok, M W Heymans
Purpose: Clinical prediction models benefit from longitudinal data. While the predictive value of a predictor's mean and change over time is well-established, the role of variability around this change is underexplored. Machine Learning methods can be effective in analyzing longitudinal data with long follow-up periods. This study evaluated the predictive value of mean, change, and variability, comparing Random Forest, Lasso regression, and logistic regression.
Methods: We compared models including only mean and change to models also incorporating variability. Predictor selection, interpretability, and performance were compared across methods. Performance was assessed using AUC, sensitivity, specificity, PPV, NPV, and calibration. Data were drawn from the Longitudinal Aging Study Amsterdam to predict depression using 81 longitudinal parameters. Models were trained on 70 % and validated on 30 % of the data. To ensure robustness, analyses were repeated over 500 random splits, and aggregated results were reported.
Results: Including variability improved AUCs for all methods. Predictor selection overlapped across models, and regression coefficients aligned with Random Forest partial dependence plots. Lasso showed the highest training AUC but poorer test performance, while logistic regression and Random Forest showed more stable results. Calibration was acceptable, though predicted risks remained below 0.6.
Conclusion: Machine Learning methods did not outperform logistic regression. Nonetheless, incorporating variability in longitudinal predictors enhances prediction, especially with expected changes in predictors, e.g., ageing populations.
{"title":"Incorporating longitudinal variability in prediction models: A comparison of machine learning and logistic regression in a cohort study with long follow-up.","authors":"L M de Groot, J W R Twisk, A A L Kok, M W Heymans","doi":"10.1016/j.annepidem.2025.07.060","DOIUrl":"10.1016/j.annepidem.2025.07.060","url":null,"abstract":"<p><strong>Purpose: </strong>Clinical prediction models benefit from longitudinal data. While the predictive value of a predictor's mean and change over time is well-established, the role of variability around this change is underexplored. Machine Learning methods can be effective in analyzing longitudinal data with long follow-up periods. This study evaluated the predictive value of mean, change, and variability, comparing Random Forest, Lasso regression, and logistic regression.</p><p><strong>Methods: </strong>We compared models including only mean and change to models also incorporating variability. Predictor selection, interpretability, and performance were compared across methods. Performance was assessed using AUC, sensitivity, specificity, PPV, NPV, and calibration. Data were drawn from the Longitudinal Aging Study Amsterdam to predict depression using 81 longitudinal parameters. Models were trained on 70 % and validated on 30 % of the data. To ensure robustness, analyses were repeated over 500 random splits, and aggregated results were reported.</p><p><strong>Results: </strong>Including variability improved AUCs for all methods. Predictor selection overlapped across models, and regression coefficients aligned with Random Forest partial dependence plots. Lasso showed the highest training AUC but poorer test performance, while logistic regression and Random Forest showed more stable results. Calibration was acceptable, though predicted risks remained below 0.6.</p><p><strong>Conclusion: </strong>Machine Learning methods did not outperform logistic regression. Nonetheless, incorporating variability in longitudinal predictors enhances prediction, especially with expected changes in predictors, e.g., ageing populations.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"51-65"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144735095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.annepidem.2025.09.022
Masanori Kuroki
Purpose
To assess the changing predictive importance of lesbian, gay, bisexual, and transgender (LGBT) status on mental health outcomes between 2014 and 2023.
Methods
We utilized data from the Behavioral Risk Factor Surveillance System (BRFSS) and employed two ensemble methods—random forests and gradient boosting—as well as traditional logistic regression, to analyze the predictive power of various factors, including LGBT status, on frequent mental distress. Frequent mental distress was defined as experiencing poor mental health for 14 or more days during the previous 30 days.
Results
Our analysis revealed a significant and consistent increase in the predictive importance of LGBT status on frequent mental distress across all three modeling approaches. Specifically, LGBT status consistently rose from the 8th or 13th most important predictor in 2014 to the 3rd or 5th most important in 2023, depending on the model. This trend demonstrates that SOGI has become one of the most influential factors for predicting mental health challenges in recent years.
Conclusions
These findings highlight the growing importance of sexual orientation and gender identity (SOGI) as a risk factor for mental health challenges.
{"title":"The rising predictive power of LGBT identity in mental health: An analysis of variable importance","authors":"Masanori Kuroki","doi":"10.1016/j.annepidem.2025.09.022","DOIUrl":"10.1016/j.annepidem.2025.09.022","url":null,"abstract":"<div><h3>Purpose</h3><div>To assess the changing predictive importance of lesbian, gay, bisexual, and transgender (LGBT) status on mental health outcomes between 2014 and 2023.</div></div><div><h3>Methods</h3><div>We utilized data from the Behavioral Risk Factor Surveillance System (BRFSS) and employed two ensemble methods—random forests and gradient boosting—as well as traditional logistic regression, to analyze the predictive power of various factors, including LGBT status, on frequent mental distress. Frequent mental distress was defined as experiencing poor mental health for 14 or more days during the previous 30 days.</div></div><div><h3>Results</h3><div>Our analysis revealed a significant and consistent increase in the predictive importance of LGBT status on frequent mental distress across all three modeling approaches. Specifically, LGBT status consistently rose from the 8th or 13th most important predictor in 2014 to the 3rd or 5th most important in 2023, depending on the model. This trend demonstrates that SOGI has become one of the most influential factors for predicting mental health challenges in recent years.</div></div><div><h3>Conclusions</h3><div>These findings highlight the growing importance of sexual orientation and gender identity (SOGI) as a risk factor for mental health challenges.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"111 ","pages":"Pages 102-106"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145226342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-07-21DOI: 10.1016/j.annepidem.2025.07.024
Dor Atias, Saar Ashri, Uri Goldbourt, Yael Benyamini, Ran Gilad-Bachrach, Tal Hasin, Yariv Gerber, Uri Obolski
Background: Healthcare data volume is increasingly expanding, presenting both challenges and opportunities. Traditional statistical methods applied in epidemiology, such as logistic regression (LR), albeit widely used, holds limited ability to handle the complexity and high dimensionality of modern datasets. In contrast, machine learning (ML) methods can model complex, non-linear relationships and are less constrained by parametric assumptions, ideal for uncovering hidden patterns.
Methods: In this study, we aim to introduce ML applications for epidemiologic research and explore three predictive models: LR as a traditional modeling approach, and least absolute shrinkage and selection operator (LASSO) regression and eXtreme Gradient Boosting (XGBoost) as ML approaches. We demonstrate how ML approaches, particularly XGBoost, can benefit epidemiologic research through a real-world case study. We present common steps: data preprocessing, model creation and evaluation processes. Additionally, we address the "black box" nature of ML models and present post hoc explanation tools to enhance interpretability.
Results: We examined the case of near-centenarianism (reaching age of 95 years or older) prediction using midlife predictors (i.e., demographic, clinical, lifestyle, occupational and dietary variables) in a cohort of approximately 10,000 middle-aged working men recruited in 1963 and followed until death or until 2019. Models were fitted and calibrated on a training set, showing good predictive performances on a separate test set. XGboost, LASSO regression, and LR achieved ROC-AUC values of 0.72 (95 % CI: 0.66-0.75), 0.71 (95 % CI: 0.67-0.74) and 0.69 (95 % CI: 0.66-0.73), respectively. Explainability analysis identified key predictors for longevity, including systolic blood pressure, smoking status, and a history of myocardial infarction; consistent with prior studies.
Conclusions: In conclusion, our findings highlight the potential of ML to enhance epidemiological studies by handling complex interactions and high-dimensional data, suggesting a complementary approach to traditional methods.
{"title":"Machine learning in epidemiology: An introduction, comparison with traditional methods, and a case study of predicting extreme longevity.","authors":"Dor Atias, Saar Ashri, Uri Goldbourt, Yael Benyamini, Ran Gilad-Bachrach, Tal Hasin, Yariv Gerber, Uri Obolski","doi":"10.1016/j.annepidem.2025.07.024","DOIUrl":"10.1016/j.annepidem.2025.07.024","url":null,"abstract":"<p><strong>Background: </strong>Healthcare data volume is increasingly expanding, presenting both challenges and opportunities. Traditional statistical methods applied in epidemiology, such as logistic regression (LR), albeit widely used, holds limited ability to handle the complexity and high dimensionality of modern datasets. In contrast, machine learning (ML) methods can model complex, non-linear relationships and are less constrained by parametric assumptions, ideal for uncovering hidden patterns.</p><p><strong>Methods: </strong>In this study, we aim to introduce ML applications for epidemiologic research and explore three predictive models: LR as a traditional modeling approach, and least absolute shrinkage and selection operator (LASSO) regression and eXtreme Gradient Boosting (XGBoost) as ML approaches. We demonstrate how ML approaches, particularly XGBoost, can benefit epidemiologic research through a real-world case study. We present common steps: data preprocessing, model creation and evaluation processes. Additionally, we address the \"black box\" nature of ML models and present post hoc explanation tools to enhance interpretability.</p><p><strong>Results: </strong>We examined the case of near-centenarianism (reaching age of 95 years or older) prediction using midlife predictors (i.e., demographic, clinical, lifestyle, occupational and dietary variables) in a cohort of approximately 10,000 middle-aged working men recruited in 1963 and followed until death or until 2019. Models were fitted and calibrated on a training set, showing good predictive performances on a separate test set. XGboost, LASSO regression, and LR achieved ROC-AUC values of 0.72 (95 % CI: 0.66-0.75), 0.71 (95 % CI: 0.67-0.74) and 0.69 (95 % CI: 0.66-0.73), respectively. Explainability analysis identified key predictors for longevity, including systolic blood pressure, smoking status, and a history of myocardial infarction; consistent with prior studies.</p><p><strong>Conclusions: </strong>In conclusion, our findings highlight the potential of ML to enhance epidemiological studies by handling complex interactions and high-dimensional data, suggesting a complementary approach to traditional methods.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"23-33"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: To explore disparities in cervical cancer diagnosis and outcomes for Asian patients and Native Hawaiian and other Pacific Islanders (NHPIs).
Methods: We extracted cervical cancer patient data collected from the Surveillance, Epidemiology, and End Results 17 database. Odds ratios (ORs) for stage and time ratios (TRs) for survival outcomes were estimated using logistic regression and accelerated failure time models, respectively.
Results: Of 18770 patients, 15,847 (84.4 %) were White; 2618 (13.9 %) were Asian; and 305 (1.6 %) were NHPI. NHPI patients were less likely than White patients to be diagnosed at an early stage (adjusted OR [aOR]: 0.60; 95 % CI, 0.47-0.77), whereas Asian patients had similar stage-at-diagnosis to White patients (aOR: 0.93; 95 % CI, 0.85-1.02). Asian patients, as a group, had significantly longer overall survival (OS) (adjusted TR [aTR]: 1.46; 95 % CI, 1.33-1.61) and disease-specific survival (DSS) (aTR: 1.35; 95 % CI, 1.21-1.51) than White patients; the opposite was true for NHPIs (OS: aTR, 0.80; 95 % CI, 0.64-1.00; DSS: aTR, 0.75; 95 % CI, 0.59-0.97).
Conclusions: We find that NHPI cervical cancer patients tend to be diagnosed later in their disease course than White patients and have shorter survival time post-diagnosis, while Asian patients tend to have longer survival time. These findings support the disaggregation of Asian and NHPI races in cervical cancer investigations.
{"title":"Differences in cervical cancer stage at diagnosis and survival outcomes among Asian, Native Hawaiian, and other Pacific Islander patients and White patients.","authors":"Zhenyu Ma, Mei Liu, Qipeng Yuan, Ziniu Tang, Peng Shang, Chen Wang, Yueze Li, Jinbo Yue","doi":"10.1016/j.annepidem.2025.07.059","DOIUrl":"10.1016/j.annepidem.2025.07.059","url":null,"abstract":"<p><strong>Purpose: </strong>To explore disparities in cervical cancer diagnosis and outcomes for Asian patients and Native Hawaiian and other Pacific Islanders (NHPIs).</p><p><strong>Methods: </strong>We extracted cervical cancer patient data collected from the Surveillance, Epidemiology, and End Results 17 database. Odds ratios (ORs) for stage and time ratios (TRs) for survival outcomes were estimated using logistic regression and accelerated failure time models, respectively.</p><p><strong>Results: </strong>Of 18770 patients, 15,847 (84.4 %) were White; 2618 (13.9 %) were Asian; and 305 (1.6 %) were NHPI. NHPI patients were less likely than White patients to be diagnosed at an early stage (adjusted OR [aOR]: 0.60; 95 % CI, 0.47-0.77), whereas Asian patients had similar stage-at-diagnosis to White patients (aOR: 0.93; 95 % CI, 0.85-1.02). Asian patients, as a group, had significantly longer overall survival (OS) (adjusted TR [aTR]: 1.46; 95 % CI, 1.33-1.61) and disease-specific survival (DSS) (aTR: 1.35; 95 % CI, 1.21-1.51) than White patients; the opposite was true for NHPIs (OS: aTR, 0.80; 95 % CI, 0.64-1.00; DSS: aTR, 0.75; 95 % CI, 0.59-0.97).</p><p><strong>Conclusions: </strong>We find that NHPI cervical cancer patients tend to be diagnosed later in their disease course than White patients and have shorter survival time post-diagnosis, while Asian patients tend to have longer survival time. These findings support the disaggregation of Asian and NHPI races in cervical cancer investigations.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"43-50"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144719075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}