ioSearch：一种使用新算法识别相互作用的多组学生物标志物的方法，应用于癌症数据集。

IF 1.7 4区医学 Q3 GENETICS & HEREDITY Genetic Epidemiology Pub Date : 2023-10-05 DOI:10.1002/gepi.22536

Sarmistha Das, Deo Kumar Srivastava

{"title":"ioSearch：一种使用新算法识别相互作用的多组学生物标志物的方法，应用于癌症数据集。","authors":"Sarmistha Das, Deo Kumar Srivastava","doi":"10.1002/gepi.22536","DOIUrl":null,"url":null,"abstract":"Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 8","pages":"600-616"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets\",\"authors\":\"Sarmistha Das, Deo Kumar Srivastava\",\"doi\":\"10.1002/gepi.22536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 8\",\"pages\":\"600-616\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22536\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22536","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

通过将多种组学整合在一起来鉴定生物标志物是很重要的，因为复杂的疾病是由于各种遗传物质的复杂相互作用而发生的。由于多重测试负担，传统的单组学关联测试既没有探索这种关键的组间依赖性，也没有识别出中等弱的信号。相反，多组学数据集成提供了互补的信息，但会增加多重测试负担、不同组学特征固有的数据多样性、高维度等。大多数可用的方法使用降维技术来解决亚型分类问题，以避免样本量问题，但相互作用的多组学生物标志物识别方法不可用。我们提出了一个两步模型，首先使用逻辑回归研究表型-组学关联。然后，使用稀疏主成分选择疾病相关组学，该主成分在多变量多元回归框架中从两个组学中探索多个变量的相互关系。在这个模型的基础上，我们开发了一种多组学生物标志物识别算法，即相互作用组学搜索（ioSearch），该算法通过使用通路信息来联合测试多个组学与疾病以及组学之间关联的影响，从而减少多重测试负担。此外，根据p值进行推断可能使其成为一种易于解释的生物标志物识别工具。广泛的模拟表明ioSearch在统计上是强大的，具有可控的I型错误率。它在公开的癌症数据集中的应用确定了重要途径中的相关组学特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ioSearch: An approach for identifying interacting multiomics biomarkers using a novel algorithm with application on breast cancer data sets

Identification of biomarkers by integrating multiple omics together is important because complex diseases occur due to an intricate interplay of various genetic materials. Traditional single-omics association tests neither explore this crucial interomics dependence nor identify moderately weak signals due to the multiple-testing burden. Conversely, multiomics data integration imparts complementary information but suffers from an increased multiple-testing burden, data diversity inherent with different omics features, high-dimensionality, and so forth. Most of the available methods address subtype classification using dimension-reduction techniques to circumvent the sample size issue but interacting multiomics biomarker identification methods are unavailable. We propose a two-step model that first investigates phenotype-omics association using logistic regression. Then, selects disease-associated omics using sparse principal components which explores the interrelationship of multiple variables from two omics in a multivariate multiple regression framework. On the basis of this model, we developed a multiomics biomarker identification algorithm, interacting omics search (ioSearch), that jointly tests the effect of multiple omics with disease and between-omics associations by using pathway information that subsequently reduces the multiple-testing burden. Further, inference in terms of p values potentially makes it an easily interpretable biomarker identification tool. Extensive simulation demonstrates ioSearch as statistically powerful with a controlled Type-I error rate. Its application to publicly available breast cancer data sets identified relevant omics features in important pathways.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.