Yinqi Zhao, Qiran Jia, Jesse Goodrich, Burcu Darst, David V Conti
{"title":"An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data.","authors":"Yinqi Zhao, Qiran Jia, Jesse Goodrich, Burcu Darst, David V Conti","doi":"10.1093/bioadv/vbae123","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Latent unknown clustering integrating multi-omics data is a novel statistical model designed for multi-omics data analysis. It integrates omics data with exposures and an outcome through a latent cluster, elucidating how exposures influence processes reflected in multi-omics measurements, ultimately affecting an outcome. A significant challenge in multi-omics analysis is the issue of list-wise missingness. To address this, we extend the model to incorporate list-wise missingness within an integrated imputation framework, which can also handle sporadic missingness when necessary.</p><p><strong>Results: </strong>Simulation studies demonstrate that our integrated imputation approach produces consistent and less biased estimates, closely reflecting true underlying values. We applied this model to data from the ISGlobal/ATHLETE \"Exposome Data Challenge Event\" to explore the association between maternal exposure to hexachlorobenzene and childhood body mass index by integrating incomplete proteomics data from 1301 children. The model successfully estimated proteomics profiles for two clusters representing higher and lower body mass index, characterizing the potential profiles linking prenatal hexachlorobenzene levels and childhood body mass index.</p><p><strong>Availability and implementation: </strong>The proposed methods have been implemented in the R package <i>LUCIDus</i>. The source code is available at https://github.com/USCbiostats/LUCIDus.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae123"},"PeriodicalIF":2.4000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368387/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Latent unknown clustering integrating multi-omics data is a novel statistical model designed for multi-omics data analysis. It integrates omics data with exposures and an outcome through a latent cluster, elucidating how exposures influence processes reflected in multi-omics measurements, ultimately affecting an outcome. A significant challenge in multi-omics analysis is the issue of list-wise missingness. To address this, we extend the model to incorporate list-wise missingness within an integrated imputation framework, which can also handle sporadic missingness when necessary.
Results: Simulation studies demonstrate that our integrated imputation approach produces consistent and less biased estimates, closely reflecting true underlying values. We applied this model to data from the ISGlobal/ATHLETE "Exposome Data Challenge Event" to explore the association between maternal exposure to hexachlorobenzene and childhood body mass index by integrating incomplete proteomics data from 1301 children. The model successfully estimated proteomics profiles for two clusters representing higher and lower body mass index, characterizing the potential profiles linking prenatal hexachlorobenzene levels and childhood body mass index.
Availability and implementation: The proposed methods have been implemented in the R package LUCIDus. The source code is available at https://github.com/USCbiostats/LUCIDus.