AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions

IF 0.9 4区 数学 Q3 Mathematics Statistical Applications in Genetics and Molecular Biology Pub Date : 2019-12-10 DOI:10.1101/869362
Meng Wang, Lihua Jiang, M. Snyder
{"title":"AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions","authors":"Meng Wang, Lihua Jiang, M. Snyder","doi":"10.1101/869362","DOIUrl":null,"url":null,"abstract":"Abstract The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053–2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599–609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ’s in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1101/869362","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053–2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599–609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ’s in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AdaReg:线性回归中的数据自适应稳健估计及其在GTEx基因表达中的应用
基因型-组织表达(GTEx)项目提供了跨多种组织类型的大规模基因表达的宝贵资源。在各种技术噪声和未知或不可测因素的影响下,如何稳健地估计主要组织效应成为一个挑战。此外,不同的基因在不同的组织类型中表现出异质表达。因此,我们需要一种适应基因表达异质性的鲁棒性方法来提高对组织效应的估计。我们采用了Fujisawa, H.和Eguchi, S.(2008)的基于γ-密度-功率权值的稳健估计方法。对重污染具有小偏差的鲁棒参数估计。[j] .地理科学与管理,1999(1):1 - 3。鲁棒模型拟合。j·罗伊。统计,Soc。B: 599-609,其中γ是控制偏差和方差之间平衡的密度权重的指数。据我们所知,我们的工作是第一个提出一个过程来调整参数γ,以平衡混合模型下的偏差-方差权衡。在高斯总体分布与未知离群分布混合的混合模型中,构建了基于加权密度的稳健似然准则,并开发了嵌入稳健估计的数据自适应γ选择程序。我们对选择准则进行了启发式分析,发现我们在各种平均性能γ下的实际选择趋势与我们在一系列设置下的模拟研究中不可估计的均方误差(MSE)趋势具有相似的捕获最小化γ的能力。与固定γ方法和其他鲁棒方法相比,我们在线性回归问题(AdaReg)中的数据自适应鲁棒化方法在GTEx项目中估计心脏样本组织效应的模拟研究和实际数据应用中都显示出显著的优势。最后,对该方法的局限性和今后的工作进行了讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
11.10%
发文量
8
审稿时长
6-12 weeks
期刊介绍: Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.
期刊最新文献
Empirically adjusted fixed-effects meta-analysis methods in genomic studies. A CNN-CBAM-BIGRU model for protein function prediction. A heavy-tailed model for analyzing miRNA-seq raw read counts. Flexible model-based non-negative matrix factorization with application to mutational signatures. Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1