A Latent Trait-based Measure as a Data Harmonization and Missing Data Solution Applied to the Environmental Influences on Child Health Outcomes Cohort.
Emily A Knapp, Amii M Kress, Ronel Ghidey, Tyler J Gorham, Brendan Galdo, Stephen A Petrill, Izzuddin M Aris, Theresa M Bastain, Carlos A Camargo, Michael A Coccia, Nicholas Cragoe, Dana Dabelea, Anne L Dunlop, Tebeb Gebretsadik, Tina Hartert, Alison E Hipwell, Christine C Johnson, Margaret R Karagas, Kaja Z LeWinn, Luis Enrique Maldonado, Cindy T McEvoy, Hooman Mirzakhani, Thomas G O'Connor, T Michael O'Shea, Zhu Wang, Rosalind J Wright, Katherine Ziegler, Yeyi Zhu, Christopher W Bartlett, Bryan Lau
{"title":"A Latent Trait-based Measure as a Data Harmonization and Missing Data Solution Applied to the Environmental Influences on Child Health Outcomes Cohort.","authors":"Emily A Knapp, Amii M Kress, Ronel Ghidey, Tyler J Gorham, Brendan Galdo, Stephen A Petrill, Izzuddin M Aris, Theresa M Bastain, Carlos A Camargo, Michael A Coccia, Nicholas Cragoe, Dana Dabelea, Anne L Dunlop, Tebeb Gebretsadik, Tina Hartert, Alison E Hipwell, Christine C Johnson, Margaret R Karagas, Kaja Z LeWinn, Luis Enrique Maldonado, Cindy T McEvoy, Hooman Mirzakhani, Thomas G O'Connor, T Michael O'Shea, Zhu Wang, Rosalind J Wright, Katherine Ziegler, Yeyi Zhu, Christopher W Bartlett, Bryan Lau","doi":"10.1097/EDE.0000000000001832","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Collaborative research consortia provide an efficient method to increase sample size, enabling evaluation of subgroup heterogeneity and rare outcomes. In addition to missing data challenges faced by all cohort studies like nonresponse and attrition, collaborative studies have missing data due to differences in study design and measurement of the contributing studies.</p><p><strong>Methods: </strong>We extend ROSETTA, a latent variable method that creates common measures across datasets collecting the same latent constructs with only partial overlap in measures, to define a common measure of socioeconomic status (SES) across cohorts with varying indicators in the Environmental influences on Child Health Outcomes Cohort, a consortium of pregnancy and pediatric cohorts.</p><p><strong>Results: </strong>Starting with 52 indicators of prenatal SES from 39,372 participants across 53 cohorts, ROSETTA created three factors representing key domains of SES: income and education, insurance and poverty, and unemployment. At least one factor score was available for 34,528 participants; two factors were available for more participants than any single indicator. Factors fit the data well, had content validity, and were correlated with alternative measures of SES (for income & education factor, r= 0.40-0.89). Higher SES as measured by the factor scores was associated with lower odds of prenatal smoking:OR income & education 0.42 (95% CI 0.38, 0.45). Missing data were reduced compared to most methods, except for multiple imputation.</p><p><strong>Conclusions: </strong>ROSETTA aids in pooled analysis of individual participant data by creating measures on a common scale and maximizing data in the presence of missing and mismatched measures.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001832","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Collaborative research consortia provide an efficient method to increase sample size, enabling evaluation of subgroup heterogeneity and rare outcomes. In addition to missing data challenges faced by all cohort studies like nonresponse and attrition, collaborative studies have missing data due to differences in study design and measurement of the contributing studies.
Methods: We extend ROSETTA, a latent variable method that creates common measures across datasets collecting the same latent constructs with only partial overlap in measures, to define a common measure of socioeconomic status (SES) across cohorts with varying indicators in the Environmental influences on Child Health Outcomes Cohort, a consortium of pregnancy and pediatric cohorts.
Results: Starting with 52 indicators of prenatal SES from 39,372 participants across 53 cohorts, ROSETTA created three factors representing key domains of SES: income and education, insurance and poverty, and unemployment. At least one factor score was available for 34,528 participants; two factors were available for more participants than any single indicator. Factors fit the data well, had content validity, and were correlated with alternative measures of SES (for income & education factor, r= 0.40-0.89). Higher SES as measured by the factor scores was associated with lower odds of prenatal smoking:OR income & education 0.42 (95% CI 0.38, 0.45). Missing data were reduced compared to most methods, except for multiple imputation.
Conclusions: ROSETTA aids in pooled analysis of individual participant data by creating measures on a common scale and maximizing data in the presence of missing and mismatched measures.
期刊介绍:
Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.