Compositional variable selection in quantile regression for microbiome data with false discovery rate control

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Statistical Analysis and Data Mining Pub Date : 2024-03-28 DOI:10.1002/sam.11674

Runze Li, Jin Mu, Songshan Yang, Cong Ye, Xiang Zhan

{"title":"Compositional variable selection in quantile regression for microbiome data with false discovery rate control","authors":"Runze Li, Jin Mu, Songshan Yang, Cong Ye, Xiang Zhan","doi":"10.1002/sam.11674","DOIUrl":null,"url":null,"abstract":"Advancement in high‐throughput sequencing technologies has stimulated intensive research interests to identify specific microbial taxa that are associated with disease conditions. Such knowledge is invaluable both from the perspective of understanding biology and from the biomedical perspective of therapeutic development, as the microbiome is inherently modifiable. Despite availability of massive data, analysis of microbiome compositional data remains difficult. The nature that relative abundances of all components of a microbial community sum to one poses challenges for statistical analysis, especially in high‐dimensional settings, where a common research theme is to select a small fraction of signals from amid many noisy features. Motivated by studies examining the role of microbiome in host transcriptomics, we propose a novel approach to identify microbial taxa that are associated with host gene expressions. Besides accommodating compositional nature of microbiome data, our method both achieves FDR‐controlled variable selection, and captures heterogeneity due to either heteroscedastic variance or non‐location‐scale covariate effects displayed in the motivating dataset. We demonstrate the superior performance of our method using extensive numerical simulation studies and then apply it to real‐world microbiome data analysis to gain novel biological insights that are missed by traditional mean‐based linear regression analysis.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"1 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11674","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Advancement in high‐throughput sequencing technologies has stimulated intensive research interests to identify specific microbial taxa that are associated with disease conditions. Such knowledge is invaluable both from the perspective of understanding biology and from the biomedical perspective of therapeutic development, as the microbiome is inherently modifiable. Despite availability of massive data, analysis of microbiome compositional data remains difficult. The nature that relative abundances of all components of a microbial community sum to one poses challenges for statistical analysis, especially in high‐dimensional settings, where a common research theme is to select a small fraction of signals from amid many noisy features. Motivated by studies examining the role of microbiome in host transcriptomics, we propose a novel approach to identify microbial taxa that are associated with host gene expressions. Besides accommodating compositional nature of microbiome data, our method both achieves FDR‐controlled variable selection, and captures heterogeneity due to either heteroscedastic variance or non‐location‐scale covariate effects displayed in the motivating dataset. We demonstrate the superior performance of our method using extensive numerical simulation studies and then apply it to real‐world microbiome data analysis to gain novel biological insights that are missed by traditional mean‐based linear regression analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

微生物组数据量化回归中的组成变量选择与错误发现率控制

高通量测序技术的进步激发了人们对确定与疾病相关的特定微生物类群的浓厚兴趣。无论是从了解生物学的角度，还是从开发疗法的生物医学角度来看，这些知识都是非常宝贵的，因为微生物组本身是可以改变的。尽管可以获得大量数据，但对微生物组组成数据的分析仍然困难重重。微生物群落中所有成分的相对丰度总和为 1 的特性给统计分析带来了挑战，尤其是在高维环境中，常见的研究主题是从众多嘈杂的特征中选择一小部分信号。受研究微生物组在宿主转录组学中作用的启发，我们提出了一种新方法来识别与宿主基因表达相关的微生物类群。除了适应微生物组数据的组成性质外，我们的方法还实现了受 FDR 控制的变量选择，并捕捉了由于异方差或非位置尺度协变量效应而导致的异质性。我们通过大量的数值模拟研究证明了我们的方法的优越性能，然后将其应用于真实世界的微生物组数据分析，以获得传统的基于均值的线性回归分析所忽略的新的生物学见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.