Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis

Jiali Lin, Inyoung Kim
{"title":"Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis","authors":"Jiali Lin, Inyoung Kim","doi":"10.1002/sam.11699","DOIUrl":null,"url":null,"abstract":"Analyzing\ncorrelated high‐dimensional data is a challenging problem in genomics, proteomics, and other related areas. For example, it is important to identify significant genetic pathway effects associated with biomarkers in which a gene pathway is a set of genes that functionally works together to regulate a certain biological process. A pathway‐based analysis can detect a subtle change in expression level that cannot be found using a gene‐based analysis. Here, we refer to pathway as a set and gene as an element in a set. However, it is challenging to select automatically which pathways are highly associated to the outcome when there are multiple pathways. In this paper, we propose a semiparametric multikernel regression model to study the effects of fixed covariates (e.g., clinical variables) and sets of elements (e.g., pathways of genes) to address a problem of detecting signal sets associated to biomarkers. We model the unknown high‐dimension functions of multi‐sets via multiple Gaussian kernel machines to consider the possibility that elements within the same set interact with each other. Hence, our variable set selection can be considered a Gaussian process set selection. We develop our Gaussian process set selection under the Bayesian variance component‐selection framework. We incorporate prior knowledge for structural sets by imposing an Ising prior on the model. Our approach can be easily applied in high‐dimensional spaces where the sample size is smaller than the number of variables. An efficient variational Bayes algorithm is developed. We demonstrate the advantages of our approach through simulation studies and through a type II diabetes genetic‐pathway analysis.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"60 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Analyzing correlated high‐dimensional data is a challenging problem in genomics, proteomics, and other related areas. For example, it is important to identify significant genetic pathway effects associated with biomarkers in which a gene pathway is a set of genes that functionally works together to regulate a certain biological process. A pathway‐based analysis can detect a subtle change in expression level that cannot be found using a gene‐based analysis. Here, we refer to pathway as a set and gene as an element in a set. However, it is challenging to select automatically which pathways are highly associated to the outcome when there are multiple pathways. In this paper, we propose a semiparametric multikernel regression model to study the effects of fixed covariates (e.g., clinical variables) and sets of elements (e.g., pathways of genes) to address a problem of detecting signal sets associated to biomarkers. We model the unknown high‐dimension functions of multi‐sets via multiple Gaussian kernel machines to consider the possibility that elements within the same set interact with each other. Hence, our variable set selection can be considered a Gaussian process set selection. We develop our Gaussian process set selection under the Bayesian variance component‐selection framework. We incorporate prior knowledge for structural sets by imposing an Ising prior on the model. Our approach can be easily applied in high‐dimensional spaces where the sample size is smaller than the number of variables. An efficient variational Bayes algorithm is developed. We demonstrate the advantages of our approach through simulation studies and through a type II diabetes genetic‐pathway analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于多途径分析的半参数多核机器回归中的高斯过程选择
分析相关的高维数据是基因组学、蛋白质组学和其他相关领域的一个挑战性问题。例如,识别与生物标志物相关的重要基因通路效应非常重要,其中基因通路是一组基因,它们在功能上共同调节某一生物过程。基于通路的分析可以检测到基因分析无法发现的表达水平的微妙变化。在这里,我们将通路称为集合,将基因称为集合中的元素。然而,当存在多个通路时,自动选择哪些通路与结果高度相关是一项挑战。在本文中,我们提出了一种半参数多核回归模型来研究固定协变量(如临床变量)和元素集(如基因的通路)的影响,以解决检测与生物标志物相关的信号集的问题。我们通过多个高斯核机器对多集合的未知高维函数进行建模,以考虑同一集合内的元素相互影响的可能性。因此,我们的变量集选择可视为高斯过程集选择。我们在贝叶斯方差成分选择框架下开发了高斯过程集选择。我们通过对模型施加伊辛先验,纳入了结构集的先验知识。我们的方法可以轻松应用于样本量小于变量数量的高维空间。我们还开发了一种高效的变分贝叶斯算法。我们通过模拟研究和 II 型糖尿病遗传途径分析展示了我们方法的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neural interval‐censored survival regression with feature selection Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics Confidence bounds for threshold similarity graph in random variable network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1