Harmonizing osteoporosis-related data across multiple data sets is essential for improving the accuracy and generalizability of bone mineral density (BMD) assessments. This study developed a harmonization framework to standardize phenotypic and genomic variables across three major US osteoporosis data sets: GDBF, GWAS, and NHANES. We standardized key phenotypic variables (BMD, body mass index (BMI), age, sex, and race/ethnicity) using cohort-specific data dictionaries and applied multiple imputations by chained equations (MICEs) to manage missing data. Genomic data were harmonized using principal component analysis (PCA)-based batch effect corrections. Residual regression methods were applied to standardize BMD values. The effectiveness of harmonization on BMD prediction was evaluated using generalized estimating equations (GEEs) and mixed-effects models. Post-harmonization, inter-study variability in BMI was significantly reduced (Ω2 = 0.0028), and BMD associations with covariates remained consistent across data sets. Harmonized models showed improved predictive performance, with explained variance in BMD increasing (R2 = 0.14%). PCA confirmed the effective alignment of genetic data, reducing batch effects and improving cross-study compatibility. This study demonstrates the feasibility and effectiveness of harmonizing phenotypic and genomic data for osteoporosis research. The harmonization framework enhances BMD prediction accuracy, supports more inclusive osteoporosis risk assessment, and improves the integration of multi-cohort data sets for future research. These findings highlight the potential of data harmonization in advancing precision medicine for osteoporosis prevention and management.
{"title":"Integrative Harmonization of Phenotypic and Genomic Data Improves Bone Mineral Density Prediction in Multi-Study Osteoporosis Research.","authors":"Anqi Liu, Jianing Liu, Lang Wu, Qing Wu","doi":"10.1002/gepi.70028","DOIUrl":"10.1002/gepi.70028","url":null,"abstract":"<p><p>Harmonizing osteoporosis-related data across multiple data sets is essential for improving the accuracy and generalizability of bone mineral density (BMD) assessments. This study developed a harmonization framework to standardize phenotypic and genomic variables across three major US osteoporosis data sets: GDBF, GWAS, and NHANES. We standardized key phenotypic variables (BMD, body mass index (BMI), age, sex, and race/ethnicity) using cohort-specific data dictionaries and applied multiple imputations by chained equations (MICEs) to manage missing data. Genomic data were harmonized using principal component analysis (PCA)-based batch effect corrections. Residual regression methods were applied to standardize BMD values. The effectiveness of harmonization on BMD prediction was evaluated using generalized estimating equations (GEEs) and mixed-effects models. Post-harmonization, inter-study variability in BMI was significantly reduced (Ω<sup>2</sup> = 0.0028), and BMD associations with covariates remained consistent across data sets. Harmonized models showed improved predictive performance, with explained variance in BMD increasing (R<sup>2</sup> = 0.14%). PCA confirmed the effective alignment of genetic data, reducing batch effects and improving cross-study compatibility. This study demonstrates the feasibility and effectiveness of harmonizing phenotypic and genomic data for osteoporosis research. The harmonization framework enhances BMD prediction accuracy, supports more inclusive osteoporosis risk assessment, and improves the integration of multi-cohort data sets for future research. These findings highlight the potential of data harmonization in advancing precision medicine for osteoporosis prevention and management.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"50 1","pages":"e70028"},"PeriodicalIF":3.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12836449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146051694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Woolf, Amy Mason, Chin Yang Shapland, Hyunseung Kang, Hannah M. Sallis, Stephen Burgess, Marcus R. Munafò
A longstanding aim of developmental psychology and epidemiology is to understand the causal effects of parental phenotypes on offspring outcomes. Traditional approaches often fail to account for confounding and reverse causation. We evaluate the use of Mendelian randomisation with non-inherited variants (MR-NIV) to address these limitations. MR-NIV leverages non-inherited genetic variants to instrument the parental phenotype independent of the offspring's genotype. We used Directed Acyclic Graphs and simulations to validate MR-NIV and explore robustness to assortative mating. In contrast to an alternative MR method which adjusts the parental genotype for offspring genotype, MR-NIV can be robust to assortative mating when used without trio data. In settings without trio data, MR-NIV outperformed the adjustment method. The adjustment method outperformed MR-NIV in settings with trio data. Applying MR-NIV to the Avon Longitudinal Study of Parents and Children, we assessed the causal effect of parental smoking on offspring smoking initiation at age 16. Results were consistent with observational studies, suggesting a meaningful increase in the risk of offspring smoking due to parental smoking. However, larger sample sizes will be necessary to provide a precise answer. MR-NIV offers a promising extension of Mendelian randomisation for studying the developmental environment.
{"title":"Extending the Use of Mendelian Randomisation With Non-Inherited Variants to Assess Socially Transmitted Parental Exposures Under Assortative Mating","authors":"Benjamin Woolf, Amy Mason, Chin Yang Shapland, Hyunseung Kang, Hannah M. Sallis, Stephen Burgess, Marcus R. Munafò","doi":"10.1002/gepi.70031","DOIUrl":"10.1002/gepi.70031","url":null,"abstract":"<p>A longstanding aim of developmental psychology and epidemiology is to understand the causal effects of parental phenotypes on offspring outcomes. Traditional approaches often fail to account for confounding and reverse causation. We evaluate the use of Mendelian randomisation with non-inherited variants (MR-NIV) to address these limitations. MR-NIV leverages non-inherited genetic variants to instrument the parental phenotype independent of the offspring's genotype. We used Directed Acyclic Graphs and simulations to validate MR-NIV and explore robustness to assortative mating. In contrast to an alternative MR method which adjusts the parental genotype for offspring genotype, MR-NIV can be robust to assortative mating when used without trio data. In settings without trio data, MR-NIV outperformed the adjustment method. The adjustment method outperformed MR-NIV in settings with trio data. Applying MR-NIV to the Avon Longitudinal Study of Parents and Children, we assessed the causal effect of parental smoking on offspring smoking initiation at age 16. Results were consistent with observational studies, suggesting a meaningful increase in the risk of offspring smoking due to parental smoking. However, larger sample sizes will be necessary to provide a precise answer. MR-NIV offers a promising extension of Mendelian randomisation for studying the developmental environment.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"50 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Moksedul Momin, Xuan Zhou, Muktar Ahmed, Elina Hyppönen, Beben Benyamin, S. Hong Lee
Accurate prediction of disease risk and other complex traits across different populations is essential for clinical and research purposes. However, genetic differences among ancestries, such as allelic frequencies and genetic architecture, can affect the performance of polygenic risk score (PGS) methods in cross-ancestry prediction. To address this issue, we conducted a formal test of seven polygenic prediction methods applicable across ancestries for five traits (BMI, standing height, LDL-, HDL- and total-cholesterol) from the UK Biobank dataset. We demonstrate that, GBLUP and PRS-CSx outperformed other methods for highly polygenic traits like height and BMI. In contrast, PRSice and PolyPred performed best for less polygenic traits like cholesterol, with PRS-CSx being comparable with larger sample sizes. We also observed that utilizing concordant SNPs, which have the same effect direction across diverse ancestries, can improve the accuracy of cross-ancestry PGS models. Furthermore, we found that the transferability of PGS across ancestries varied depending on the trait. Understanding the strengths and limitations of different methods and approaches is important for future methodological development and improvement, enabling better interpretation and application of PGS results in clinical and research settings.
{"title":"Cross-Ancestry Polygenic Prediction: Comparing Methods and Assessing Transferability Across Traits","authors":"Md. Moksedul Momin, Xuan Zhou, Muktar Ahmed, Elina Hyppönen, Beben Benyamin, S. Hong Lee","doi":"10.1002/gepi.70029","DOIUrl":"10.1002/gepi.70029","url":null,"abstract":"<p>Accurate prediction of disease risk and other complex traits across different populations is essential for clinical and research purposes. However, genetic differences among ancestries, such as allelic frequencies and genetic architecture, can affect the performance of polygenic risk score (PGS) methods in cross-ancestry prediction. To address this issue, we conducted a formal test of seven polygenic prediction methods applicable across ancestries for five traits (BMI, standing height, LDL-, HDL- and total-cholesterol) from the UK Biobank dataset. We demonstrate that, GBLUP and PRS-CSx outperformed other methods for highly polygenic traits like height and BMI. In contrast, PRSice and PolyPred performed best for less polygenic traits like cholesterol, with PRS-CSx being comparable with larger sample sizes. We also observed that utilizing concordant SNPs, which have the same effect direction across diverse ancestries, can improve the accuracy of cross-ancestry PGS models. Furthermore, we found that the transferability of PGS across ancestries varied depending on the trait. Understanding the strengths and limitations of different methods and approaches is important for future methodological development and improvement, enabling better interpretation and application of PGS results in clinical and research settings.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"50 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12820924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mediation analysis is a pivotal tool for elucidating the indirect effect of an environmental factor or treatment on disease through potentially high-dimensional omics data, such as gene expression profiles. However, traditional mediation analysis methods tailored for binary outcomes often rely on the rare disease assumption in logistic regression and provide inadequate measures of total mediation effect when multiple mediators have effects in different directions. In this paper, we develop a MEdiation analysis framework in LOgistic regression for high-Dimensional mediators and a binarY outcome (MELODY). It leverages a second-moment-based measure analogous to the