{"title":"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets","authors":"Sarmistha Das, Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":null,"url":null,"abstract":"<p>Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of <i>p</i> values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22536","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.