Pub Date : 2026-01-31DOI: 10.1016/j.annepidem.2026.01.016
Angela D. Liese PhD, MPH , Brian E. Dixon PhD , Tessa Crume PhD,MSPH , Jasmin Divers PhD , Yi Guo PhD , Annemarie G. Hirsch PhD, MPH , Kristi Reynolds PhD , Levon Utidjian MD , Ibrahim Zaganjor PhD , Marc Rosenman MD , for the DiCAYA Study Group
Purpose
A critical function of public health is to monitor diseases that impede quality of life and burden affected communities. The Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network aims to advance disease monitoring for diabetes using multi-site electronic health record (EHR) data.
Methods
This work involved validating and refining case definitions for accurate identification of type 1 and type 2 diabetes cases to estimate incidence and prevalence of diabetes in children, adolescents, and young adults through age 44 years.
Results
In this essay, we describe the challenges experienced by the Network and lessons learned. Challenges included accessing EHR data, harmonizing EHR data from heterogeneous health systems to a common data model, and developing methods to account for bias introduced by the non-representativeness of health care utilization data. Lessons learned included approaches for data quality assessment, bias correction, and scalability.
Conclusions
As the US continues to evolve its public health data systems and its approach to chronic disease monitoring, the DiCAYA Network offers guidance on factors for success as well as pitfalls to avoid.
{"title":"Public health monitoring of diabetes in the era of electronic health records: Insights from the Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network","authors":"Angela D. Liese PhD, MPH , Brian E. Dixon PhD , Tessa Crume PhD,MSPH , Jasmin Divers PhD , Yi Guo PhD , Annemarie G. Hirsch PhD, MPH , Kristi Reynolds PhD , Levon Utidjian MD , Ibrahim Zaganjor PhD , Marc Rosenman MD , for the DiCAYA Study Group","doi":"10.1016/j.annepidem.2026.01.016","DOIUrl":"10.1016/j.annepidem.2026.01.016","url":null,"abstract":"<div><h3>Purpose</h3><div>A critical function of public health is to monitor diseases that impede quality of life and burden affected communities. The Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network aims to advance disease monitoring for diabetes using multi-site electronic health record (EHR) data.</div></div><div><h3>Methods</h3><div>This work involved validating and refining case definitions for accurate identification of type 1 and type 2 diabetes cases to estimate incidence and prevalence of diabetes in children, adolescents, and young adults through age 44 years.</div></div><div><h3>Results</h3><div>In this essay, we describe the challenges experienced by the Network and lessons learned. Challenges included accessing EHR data, harmonizing EHR data from heterogeneous health systems to a common data model, and developing methods to account for bias introduced by the non-representativeness of health care utilization data. Lessons learned included approaches for data quality assessment, bias correction, and scalability.</div></div><div><h3>Conclusions</h3><div>As the US continues to evolve its public health data systems and its approach to chronic disease monitoring, the DiCAYA Network offers guidance on factors for success as well as pitfalls to avoid.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 45-49"},"PeriodicalIF":3.0,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1016/j.annepidem.2026.01.011
Yu He MPH , Chanapong Rojanaworarit PhD
Purpose
To compare seven machine learning (ML) models developed to predict non-response to the sexual identity question in the 2023 Youth Risk Behavior Surveillance System (YRBSS) and identify the best-performing ML model, along with key attributes associated with the non-response.
Methods
Data of 20,103 students, with 32 predictors across domains of personal characteristics, school behavior, substance use, and sexual activity were analyzed. Supervised ML models–including random forest (RF), gradient boosting, extreme gradient boosting, decision tree, neural network, lasso, and elastic net were developed and incorporated survey weights. Performance was assessed using F1 score, area under the ROC curve (AUC), and area under the precision-recall curve (AUPRC).
Results
About 10 % of students didn’t respond to the sexual identity question, with higher rates among racial/ethnic minorities, including American Indian/Alaska Native and Native Hawaiian/Pacific Islander youths. RF model showed the most robust overall performance across all metrics. Attributes predicting non-response included response status to questions of school absence due to safety concerns and having ≥ 4 sexual partners.
Conclusions
Non-response was non-random and concentrated among vulnerable groups. Predictive performance was strong, but findings suggest that response patterns to other sensitive survey items play substantial role, with implications for survey design and non-response adjustment.
{"title":"Predicting nonresponse to sexual identity question in youth risk behavior surveillance: A machine learning analysis of complex survey data","authors":"Yu He MPH , Chanapong Rojanaworarit PhD","doi":"10.1016/j.annepidem.2026.01.011","DOIUrl":"10.1016/j.annepidem.2026.01.011","url":null,"abstract":"<div><h3>Purpose</h3><div>To compare seven machine learning (ML) models developed to predict non-response to the sexual identity question in the 2023 Youth Risk Behavior Surveillance System (YRBSS) and identify the best-performing ML model, along with key attributes associated with the non-response.</div></div><div><h3>Methods</h3><div>Data of 20,103 students, with 32 predictors across domains of personal characteristics, school behavior, substance use, and sexual activity were analyzed. Supervised ML models–including random forest (RF), gradient boosting, extreme gradient boosting, decision tree, neural network, lasso, and elastic net were developed and incorporated survey weights. Performance was assessed using F1 score, area under the ROC curve (AUC), and area under the precision-recall curve (AUPRC).</div></div><div><h3>Results</h3><div>About 10 % of students didn’t respond to the sexual identity question, with higher rates among racial/ethnic minorities, including American Indian/Alaska Native and Native Hawaiian/Pacific Islander youths. RF model showed the most robust overall performance across all metrics. Attributes predicting non-response included response status to questions of school absence due to safety concerns and having ≥ 4 sexual partners.</div></div><div><h3>Conclusions</h3><div>Non-response was non-random and concentrated among vulnerable groups. Predictive performance was strong, but findings suggest that response patterns to other sensitive survey items play substantial role, with implications for survey design and non-response adjustment.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 37-44"},"PeriodicalIF":3.0,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146100960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.annepidem.2026.01.014
Afroza Parvin , Rebecca D. Kehm , Baozhen Qiao , James E. Cone , Mark R. Farfel , Rachel Zeig-Owens , David G. Goldfarb , Moshe Z. Shapiro , Andrew C. Todd , Tabassum Insaf , Charles B. Hall , Paolo Boffetta , Jiehui Li
Purpose
The World Trade Center Health Program (WTCHP) plays a critical role in medical monitoring and treatment to those exposed to the terrorist attacks of September 11, 2001 (9/11). We investigated the association of WTCHP membership with mortality risk among 9/11 responders while controlling for comorbidities using inverse probability weighting.
Methods
We prospectively analyzed 28,430 9/11 responders, followed from the time of their enrollment into the WTCHP or the WTC Health Registry, through 2020. NDI linkage provided death data. Non-cancer comorbidities were self-reported physician-diagnosis and cancer was identified through cancer registry linkage. We estimated the adjusted hazard ratio (aHR) with 95 % confidence interval (CI) for the association between WTCHP membership and all-cause and cause-specific mortality using Cox proportional hazards models and cause-specific hazard regression models, respectively.
Results
A total of 1657 deaths were identified over 444,425 person-years of follow-up. Compared to non-members, WTCHP members had a lower risk of all-cause mortality (aHR=0.87; 95 % CI=0.77–0.98) and smoking-related mortality (aHR=0.83; 0.69–0.99) after adjusting for demographics, WTC exposure, and weights of comorbidities. With the membership-sex interaction included, reduced risk of all-cause mortality remained statistically significant among males only (aHR=0.85; 0.75–0.96). Cancer- and heart-related mortality risk were not significantly different between WTCHP members and non-members.
Conclusions
This study found that WTCHP membership may reduce risks of all-cause and smoking-related mortality among 9/11 responders, even after accounting for underlying medical conditions, underscoring the importance of comprehensive health monitoring and treatment services for disaster-relief workers.
{"title":"Effect of World Trade Center Health Program on mortality among 9/11 responders","authors":"Afroza Parvin , Rebecca D. Kehm , Baozhen Qiao , James E. Cone , Mark R. Farfel , Rachel Zeig-Owens , David G. Goldfarb , Moshe Z. Shapiro , Andrew C. Todd , Tabassum Insaf , Charles B. Hall , Paolo Boffetta , Jiehui Li","doi":"10.1016/j.annepidem.2026.01.014","DOIUrl":"10.1016/j.annepidem.2026.01.014","url":null,"abstract":"<div><h3>Purpose</h3><div>The World Trade Center Health Program (WTCHP) plays a critical role in medical monitoring and treatment to those exposed to the terrorist attacks of September 11, 2001 (9/11). We investigated the association of WTCHP membership with mortality risk among 9/11 responders while controlling for comorbidities using inverse probability weighting.</div></div><div><h3>Methods</h3><div>We prospectively analyzed 28,430 9/11 responders, followed from the time of their enrollment into the WTCHP or the WTC Health Registry, through 2020. NDI linkage provided death data. Non-cancer comorbidities were self-reported physician-diagnosis and cancer was identified through cancer registry linkage. We estimated the adjusted hazard ratio (aHR) with 95 % confidence interval (CI) for the association between WTCHP membership and all-cause and cause-specific mortality using Cox proportional hazards models and cause-specific hazard regression models, respectively.</div></div><div><h3>Results</h3><div>A total of 1657 deaths were identified over 444,425 person-years of follow-up. Compared to non-members, WTCHP members had a lower risk of all-cause mortality (aHR=0.87; 95 % CI=0.77–0.98) and smoking-related mortality (aHR=0.83; 0.69–0.99) after adjusting for demographics, WTC exposure, and weights of comorbidities. With the membership-sex interaction included, reduced risk of all-cause mortality remained statistically significant among males only (aHR=0.85; 0.75–0.96). Cancer- and heart-related mortality risk were not significantly different between WTCHP members and non-members.</div></div><div><h3>Conclusions</h3><div>This study found that WTCHP membership may reduce risks of all-cause and smoking-related mortality among 9/11 responders, even after accounting for underlying medical conditions, underscoring the importance of comprehensive health monitoring and treatment services for disaster-relief workers.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 8-14"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1016/j.annepidem.2026.01.013
Romain Brisson
Purpose
This study examined how careless and inconsistent reporting affects adolescent suicidality prevalence and sex differences, a methodological issue often overlooked in self-report epidemiological research.
Methods
I used data from two nationally representative surveys of secondary-school students conducted in 2010 (n = 7640; 49.3 % female) and 2014 (n = 5592; 52.6 % female). Both surveys assessed depressive symptoms, suicidal ideation, suicide plans, suicide attempts, attempt recognition, and attempt disclosure. Three methods of prevalence computation were used: unadjusted estimates (M1); excluding fictitious drug endorsers and treating inconsistencies as missing (M2); and excluding all careless and inconsistent reporters (M3).
Results
About 19 % of respondents were identified as careless or inconsistent. Compared to M1, M2 and M3 yielded lower prevalence estimates for most indicators. The largest reductions involved, on average, reports of unnoticed suicide attempts (-73.8 %), talking to no one about an attempt (-73.3 %), and reporting six or more suicide attempts (-35.9 %). Most sex differences were unaffected, except for the ‘six or more suicide attempts’ category and attempt recognition and disclosure items.
Conclusions
Overlooking misreporting may inflate adolescent suicidality prevalence and distort sex-difference estimates. Incorporating validity checks and data-cleaning procedures can improve the accuracy of epidemiological findings and the effectiveness of prevention programs.
{"title":"Careless and inconsistent reporting inflates suicidality prevalence and biases sex differences","authors":"Romain Brisson","doi":"10.1016/j.annepidem.2026.01.013","DOIUrl":"10.1016/j.annepidem.2026.01.013","url":null,"abstract":"<div><h3>Purpose</h3><div>This study examined how careless and inconsistent reporting affects adolescent suicidality prevalence and sex differences, a methodological issue often overlooked in self-report epidemiological research.</div></div><div><h3>Methods</h3><div>I used data from two nationally representative surveys of secondary-school students conducted in 2010 (<em>n</em> = 7640; 49.3 % female) and 2014 (<em>n</em> = 5592; 52.6 % female). Both surveys assessed depressive symptoms, suicidal ideation, suicide plans, suicide attempts, attempt recognition, and attempt disclosure. Three methods of prevalence computation were used: unadjusted estimates (M1); excluding fictitious drug endorsers and treating inconsistencies as missing (M2); and excluding all careless and inconsistent reporters (M3).</div></div><div><h3>Results</h3><div>About 19 % of respondents were identified as careless or inconsistent. Compared to M1, M2 and M3 yielded lower prevalence estimates for most indicators. The largest reductions involved, on average, reports of unnoticed suicide attempts (-73.8 %), talking to no one about an attempt (-73.3 %), and reporting six or more suicide attempts (-35.9 %). Most sex differences were unaffected, except for the ‘six or more suicide attempts’ category and attempt recognition and disclosure items.</div></div><div><h3>Conclusions</h3><div>Overlooking misreporting may inflate adolescent suicidality prevalence and distort sex-difference estimates. Incorporating validity checks and data-cleaning procedures can improve the accuracy of epidemiological findings and the effectiveness of prevention programs.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 23-27"},"PeriodicalIF":3.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.annepidem.2026.01.007
Longjian Liu MD, PhD, MSc, Jintong Hou, PhD
Purpose
We aimed to identify key midlife dementia predictors and develop a novel machine learning (ML) -enabled risk prediction model.
Methods
Using data from 9266 Atherosclerosis Risk in Communities study participants (aged 45–64 years at baseline, 1987–1989). Incident dementia was ascertained through December 2019. A ML-based LASSO-Cox model was applied to develop the risk prediction model.
Results
Over a 25-year mean follow-up, 2010 participants developed dementia. The LASSO-Cox model identified 12 key predictors and achieved C-indices (95 %CI) of 0.77 (0.75–0.79) in the training set (n = 6182) and 0.78 (0.76–0.81) in the test set (n = 3084). Predictors included age, Digit Symbol Substitution Test, apolipoprotein E ε4, HbA1c, brachial blood pressure, Factor VIII, Delayed Word Recall Test, hypertension, stroke history, C-reactive protein, white blood cell count, and apolipoprotein B. The resulting nomogram demonstrated strong discrimination (AUC 0.77–0.86) and good calibration. LASSO-Cox risk score quartiles effectively stratified participants into low, moderate, high, and very high dementia risk groups.
Conclusions
The findings demonstrate that the newly developed machine learning-based LASSO-Cox model provides a robust method to predict individuals at high risk of dementia.
{"title":"Machine learning-based LASSO-Cox model for dementia prediction: The role of midlife cardiometabolic, inflammatory, and genetic risk factors in a US cohort","authors":"Longjian Liu MD, PhD, MSc, Jintong Hou, PhD","doi":"10.1016/j.annepidem.2026.01.007","DOIUrl":"10.1016/j.annepidem.2026.01.007","url":null,"abstract":"<div><h3>Purpose</h3><div>We aimed to identify key midlife dementia predictors and develop a novel machine learning (ML) -enabled risk prediction model.</div></div><div><h3>Methods</h3><div>Using data from 9266 Atherosclerosis Risk in Communities study participants (aged 45–64 years at baseline, 1987–1989). Incident dementia was ascertained through December 2019. A ML-based LASSO-Cox model was applied to develop the risk prediction model.</div></div><div><h3>Results</h3><div>Over a 25-year mean follow-up, 2010 participants developed dementia. The LASSO-Cox model identified 12 key predictors and achieved C-indices (95 %CI) of 0.77 (0.75–0.79) in the training set (n = 6182) and 0.78 (0.76–0.81) in the test set (n = 3084). Predictors included age, Digit Symbol Substitution Test, apolipoprotein E ε4, HbA1c, brachial blood pressure, Factor VIII, Delayed Word Recall Test, hypertension, stroke history, C-reactive protein, white blood cell count, and apolipoprotein B. The resulting nomogram demonstrated strong discrimination (AUC 0.77–0.86) and good calibration. LASSO-Cox risk score quartiles effectively stratified participants into low, moderate, high, and very high dementia risk groups.</div></div><div><h3>Conclusions</h3><div>The findings demonstrate that the newly developed machine learning-based LASSO-Cox model provides a robust method to predict individuals at high risk of dementia.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 28-36"},"PeriodicalIF":3.0,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146046999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-17DOI: 10.1016/j.annepidem.2026.01.008
Angela D’Adamo , Amii M. Kress , Rima Habre , Nissa Towe-Goodman , Michael R. Desjardins , Akram Alshawabkeh , Izzuddin M. Aris , Carlos A. Camargo Jr. , Kecia N. Carroll , Andrea E. Cassidy-Bushrow , Su H. Chu , Yolaine Civil , Alexandrea L. Craft , Lisa A. Croen , Sean Deoni , Viren Dsa , Anne L. Dunlop , Amy J. Elliott , Assiamira Ferrara , Jody M. Ganiban , Emily A. Knapp
Purpose
To examine factors associated with moving during pregnancy and impacts of assigning nSES at enrollment, delivery, or a time-weighted average on birth outcomes (birthweight, birthweight-for-gestational-age z-score, low birthweight, gestational age, small-for-gestational age, preterm birth).
Methods
We used data from the Environmental influences on Child Health Outcomes (ECHO) Cohort Study (2010–2019) with nSES data from the American Community Survey (ACS) matched by time and location to monthly residential histories. We used multivariable logistic models with Generalized Estimating Equations to identify factors associated with moving and quantify exposure misclassification in model estimates.
Results
Approximately 7 % of 15,376 participants moved at least once during pregnancy. Maternal age (OR: 0.97, 95 % CI: 0.95, 0.98) and other race vs. White (OR: 0.39, 95 % CI: 0.20, 0.80) were associated with lower odds of moving; lower neighborhood-level education (OR: 1.34, 95 % CI: 1.11, 1.62) and living in urban neighborhoods (OR: 3.03, 95 % CI: 1.39, 6.59) were associated with higher odds. Among movers, estimates between nSES and birth outcomes changed ≥ 16 % by address assignment; birthweight-for-gestational-age z-score was significant only when using nSES at delivery.
Conclusion
Sociodemographic and nSES characteristics are associated with moving during pregnancy; movers may experience exposure misclassification and underestimated effects on birth outcomes.
{"title":"Residential mobility during pregnancy and birth outcomes in the United States: The environmental influences on Child Health Outcomes (ECHO) Cohort (2010–2019)","authors":"Angela D’Adamo , Amii M. Kress , Rima Habre , Nissa Towe-Goodman , Michael R. Desjardins , Akram Alshawabkeh , Izzuddin M. Aris , Carlos A. Camargo Jr. , Kecia N. Carroll , Andrea E. Cassidy-Bushrow , Su H. Chu , Yolaine Civil , Alexandrea L. Craft , Lisa A. Croen , Sean Deoni , Viren Dsa , Anne L. Dunlop , Amy J. Elliott , Assiamira Ferrara , Jody M. Ganiban , Emily A. Knapp","doi":"10.1016/j.annepidem.2026.01.008","DOIUrl":"10.1016/j.annepidem.2026.01.008","url":null,"abstract":"<div><h3>Purpose</h3><div>To examine factors associated with moving during pregnancy and impacts of assigning nSES at enrollment, delivery, or a time-weighted average on birth outcomes (birthweight, birthweight-for-gestational-age z-score, low birthweight, gestational age, small-for-gestational age, preterm birth).</div></div><div><h3>Methods</h3><div>We used data from the Environmental influences on Child Health Outcomes (ECHO) Cohort Study (2010–2019) with nSES data from the American Community Survey (ACS) matched by time and location to monthly residential histories. We used multivariable logistic models with Generalized Estimating Equations to identify factors associated with moving and quantify exposure misclassification in model estimates.</div></div><div><h3>Results</h3><div>Approximately 7 % of 15,376 participants moved at least once during pregnancy. Maternal age (OR: 0.97, 95 % CI: 0.95, 0.98) and other race vs. White (OR: 0.39, 95 % CI: 0.20, 0.80) were associated with lower odds of moving; lower neighborhood-level education (OR: 1.34, 95 % CI: 1.11, 1.62) and living in urban neighborhoods (OR: 3.03, 95 % CI: 1.39, 6.59) were associated with higher odds. Among movers, estimates between nSES and birth outcomes changed ≥ 16 % by address assignment; birthweight-for-gestational-age z-score was significant only when using nSES at delivery.</div></div><div><h3>Conclusion</h3><div>Sociodemographic and nSES characteristics are associated with moving during pregnancy; movers may experience exposure misclassification and underestimated effects on birth outcomes.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 15-22"},"PeriodicalIF":3.0,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1016/j.annepidem.2026.01.004
Peter M. Socha , Maryam Oskoui , Jennifer A. Hutcheon , Sam Harper
Purpose
To improve the identification of cerebral palsy cases in administrative health data.
Methods
We included all children in a population-based cerebral palsy registry in Quebec, Canada, born from 1999 through 2002, and a sample of children without cerebral palsy. Population-based hospitalization and physician billing records through 2012 were obtained for all children. We used logistic regression to model the probability of cerebral palsy, using International Classification of Diseases codes for related diseases. We reported receiver operating characteristic (ROC) and precision-recall (PR) curves, and compared the accuracy to that of existing algorithms. We also reported the accuracy of cerebral palsy codes by age, data source, and gestational age at birth.
Results
The area under the ROC and PR curves of our model were 0.98 (95 % CI: 0.97–0.99) and 0.73 (95 % CI: 0.63–0.79), respectively. Cut-offs with a similar specificity to existing algorithms yielded sensitivities that were 1–14 %age-points higher. The sensitivity of cerebral palsy codes was higher (and the specificity was lower) with longer follow-up times since birth, when using both hospitalization and billing records, and among children born preterm.
Conclusions
Our model improved identification of cerebral palsy cases in administrative data, but residual misclassification remained.
{"title":"A multivariable model for improving the identification of cerebral palsy cases in administrative health data","authors":"Peter M. Socha , Maryam Oskoui , Jennifer A. Hutcheon , Sam Harper","doi":"10.1016/j.annepidem.2026.01.004","DOIUrl":"10.1016/j.annepidem.2026.01.004","url":null,"abstract":"<div><h3>Purpose</h3><div>To improve the identification of cerebral palsy cases in administrative health data.</div></div><div><h3>Methods</h3><div>We included all children in a population-based cerebral palsy registry in Quebec, Canada, born from 1999 through 2002, and a sample of children without cerebral palsy. Population-based hospitalization and physician billing records through 2012 were obtained for all children. We used logistic regression to model the probability of cerebral palsy, using International Classification of Diseases codes for related diseases. We reported receiver operating characteristic (ROC) and precision-recall (PR) curves, and compared the accuracy to that of existing algorithms. We also reported the accuracy of cerebral palsy codes by age, data source, and gestational age at birth.</div></div><div><h3>Results</h3><div>The area under the ROC and PR curves of our model were 0.98 (95 % CI: 0.97–0.99) and 0.73 (95 % CI: 0.63–0.79), respectively. Cut-offs with a similar specificity to existing algorithms yielded sensitivities that were 1–14 %age-points higher. The sensitivity of cerebral palsy codes was higher (and the specificity was lower) with longer follow-up times since birth, when using both hospitalization and billing records, and among children born preterm.</div></div><div><h3>Conclusions</h3><div>Our model improved identification of cerebral palsy cases in administrative data, but residual misclassification remained.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"114 ","pages":"Pages 26-31"},"PeriodicalIF":3.0,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.annepidem.2026.01.003
Emaan Rashidi MHS , Madeline Brooks MPH , Ahmed Hassoon MD, MPH , Shruti Mehta PhD, MPH , Keri Althoff PhD, MPH , G. Caleb Alexander MD, MS
Epidemiology has long been central to public health, guiding our understanding of the distribution and determinants of disease. As the field has evolved—from John Snow’s cholera investigations to large-scale cohort studies and causal inference frameworks—it now faces a transformative juncture with the advent of artificial intelligence/machine learning (AI/ML). These technologies offer unprecedented opportunities to improve data measurement, inference, and population health insights, yet also pose methodological and ethical challenges. Anchored by the core epidemiologic domains of study population, measurement, and inference, we examine how epidemiologists can use AI/ML effectively. We consider the importance of careful population definition, informed sampling, and external validation to ensure generalizability and minimize bias when AI/ML is used. We also explore the need for rigorous assessment of data quality and model reliability, which strengthens the case for conceptual frameworks in guiding interpretation of scientific investigations. To realize AI/ML’s potential, epidemiology must adapt its training, invest in infrastructure, and promote interdisciplinary collaboration. Doing so will ensure that epidemiologic science remains robust, reproducible, and relevant in a rapidly evolving informational landscape. This moment calls for a strategic integration of AI/ML into the fabric of epidemiologic practice and training to advance both science and public health.
{"title":"Is artificial intelligence a friend or foe to epidemiology?","authors":"Emaan Rashidi MHS , Madeline Brooks MPH , Ahmed Hassoon MD, MPH , Shruti Mehta PhD, MPH , Keri Althoff PhD, MPH , G. Caleb Alexander MD, MS","doi":"10.1016/j.annepidem.2026.01.003","DOIUrl":"10.1016/j.annepidem.2026.01.003","url":null,"abstract":"<div><div>Epidemiology has long been central to public health, guiding our understanding of the distribution and determinants of disease. As the field has evolved—from John Snow’s cholera investigations to large-scale cohort studies and causal inference frameworks—it now faces a transformative juncture with the advent of artificial intelligence/machine learning (AI/ML). These technologies offer unprecedented opportunities to improve data measurement, inference, and population health insights, yet also pose methodological and ethical challenges. Anchored by the core epidemiologic domains of study population, measurement, and inference, we examine how epidemiologists can use AI/ML effectively. We consider the importance of careful population definition, informed sampling, and external validation to ensure generalizability and minimize bias when AI/ML is used. We also explore the need for rigorous assessment of data quality and model reliability, which strengthens the case for conceptual frameworks in guiding interpretation of scientific investigations. To realize AI/ML’s potential, epidemiology must adapt its training, invest in infrastructure, and promote interdisciplinary collaboration. Doing so will ensure that epidemiologic science remains robust, reproducible, and relevant in a rapidly evolving informational landscape. This moment calls for a strategic integration of AI/ML into the fabric of epidemiologic practice and training to advance both science and public health.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"115 ","pages":"Pages 2-7"},"PeriodicalIF":3.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.annepidem.2026.01.002
Omobola O. Oluwafemi , Laura E. Mitchell , Jenil R. Patel , Wendy N. Nembhard , Gary M. Shaw , Andrew F. Olshan , Han Chen , A.J. Agopian
Purpose
To estimate associations between paternal race and ethnicity and a spectrum of birth defects.
Methods
We analyzed data from the National Birth Defects Prevention Study for infants with birth defects and controls delivered between 1997–2011. Using unconditional logistic regression, we assessed associations between paternal race and ethnicity and 32 birth defects, before and after adjusting for maternal race and ethnicity and 14 other factors.
Results
Data from 33,455 fathers were analyzed (889 Asian/Pacific Islander [A/PI], 8394 Hispanic, 4139 non-Hispanic Black [NHB], and 20,033 non-Hispanic White [NHW]). Compared with NHW fathers, A/PI paternal race and ethnicity was significantly associated with 6/32 defects, Hispanic paternal ethnicity with 6/32 defects, and NHB paternal race and ethnicity with 7/32 defects, after adjustment. The strongest associations included A/PI and pulmonary valve stenosis (adjusted odds ratio [aOR] 0.36, 95 % CI 0.18–0.71), Hispanic and heterotaxy (aOR 2.53, 95 % CI 1.57–4.06), and NHB and gastroschisis (aOR 2.25, 95 % CI 1.62–3.12).
Conclusions
Paternal race and ethnicity were associated with heterotaxy, cleft lip with or without cleft palate, and spina bifida, independent of maternal race and ethnicity. These findings warrant replication and further investigation into biological, environmental, and social mechanisms that may underlie these associations.
目的:估计父亲种族和民族与出生缺陷谱之间的关系。方法:我们分析了1997-2011年出生缺陷和对照婴儿的国家出生缺陷预防研究数据。使用无条件逻辑回归,我们评估了父亲种族和民族与32个出生缺陷之间的关系,在调整母亲种族和民族以及14个其他因素之前和之后。结果:分析了33,455名父亲的数据(889名亚洲/太平洋岛民[A/PI], 8,394名西班牙裔,4,139名非西班牙裔黑人[NHB]和20,033名非西班牙裔白人[NHW])。与NHW父亲比较,A/PI父亲种族与6/32缺陷显著相关,西班牙裔父亲种族与6/32缺陷显著相关,NHB父亲种族与7/32缺陷显著相关。最强的相关性包括A/PI和肺动脉瓣狭窄(校正优势比[aOR] 0.36, 95% CI 0.18-0.71),西班牙裔和异位(aOR 2.53, 95% CI 1.57-4.06),以及NHB和胃裂(aOR 2.25, 95% CI 1.62-3.12)。结论:父亲的种族和民族与异位、唇裂伴或不伴腭裂、脊柱裂相关,与母亲的种族和民族无关。这些发现值得重复,并进一步研究这些关联背后的生物、环境和社会机制。
{"title":"The association between paternal race and ethnicity and a spectrum of birth defects in a national case-control study","authors":"Omobola O. Oluwafemi , Laura E. Mitchell , Jenil R. Patel , Wendy N. Nembhard , Gary M. Shaw , Andrew F. Olshan , Han Chen , A.J. Agopian","doi":"10.1016/j.annepidem.2026.01.002","DOIUrl":"10.1016/j.annepidem.2026.01.002","url":null,"abstract":"<div><h3>Purpose</h3><div>To estimate associations between paternal race and ethnicity and a spectrum of birth defects.</div></div><div><h3>Methods</h3><div>We analyzed data from the National Birth Defects Prevention Study for infants with birth defects and controls delivered between 1997–2011. Using unconditional logistic regression, we assessed associations between paternal race and ethnicity and 32 birth defects, before and after adjusting for maternal race and ethnicity and 14 other factors.</div></div><div><h3>Results</h3><div>Data from 33,455 fathers were analyzed (889 Asian/Pacific Islander [A/PI], 8394 Hispanic, 4139 non-Hispanic Black [NHB], and 20,033 non-Hispanic White [NHW]). Compared with NHW fathers, A/PI paternal race and ethnicity was significantly associated with 6/32 defects, Hispanic paternal ethnicity with 6/32 defects, and NHB paternal race and ethnicity with 7/32 defects, after adjustment. The strongest associations included A/PI and pulmonary valve stenosis (adjusted odds ratio [aOR] 0.36, 95 % CI 0.18–0.71), Hispanic and heterotaxy (aOR 2.53, 95 % CI 1.57–4.06), and NHB and gastroschisis (aOR 2.25, 95 % CI 1.62–3.12).</div></div><div><h3>Conclusions</h3><div>Paternal race and ethnicity were associated with heterotaxy, cleft lip with or without cleft palate, and spina bifida, independent of maternal race and ethnicity. These findings warrant replication and further investigation into biological, environmental, and social mechanisms that may underlie these associations.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"114 ","pages":"Pages 12-21"},"PeriodicalIF":3.0,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145946494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}