估计异质分组中具有共同结构的多个网络

IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY Journal of Multivariate Analysis Pub Date : 2024-02-13 DOI:10.1016/j.jmva.2024.105298
Xing Qin , Jianhua Hu , Shuangge Ma , Mengyun Wu
{"title":"估计异质分组中具有共同结构的多个网络","authors":"Xing Qin ,&nbsp;Jianhua Hu ,&nbsp;Shuangge Ma ,&nbsp;Mengyun Wu","doi":"10.1016/j.jmva.2024.105298","DOIUrl":null,"url":null,"abstract":"<div><p>Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105298"},"PeriodicalIF":1.4000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimation of multiple networks with common structures in heterogeneous subgroups\",\"authors\":\"Xing Qin ,&nbsp;Jianhua Hu ,&nbsp;Shuangge Ma ,&nbsp;Mengyun Wu\",\"doi\":\"10.1016/j.jmva.2024.105298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.</p></div>\",\"PeriodicalId\":16431,\"journal\":{\"name\":\"Journal of Multivariate Analysis\",\"volume\":\"202 \",\"pages\":\"Article 105298\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Multivariate Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0047259X24000058\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X24000058","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

网络估算一直是高维数据分析的重要组成部分,可以帮助人们了解潜在的复杂依赖结构。在现有的研究中,高斯图形模型一直很受欢迎。然而,由于存在均匀分布假设,而且只适用于小规模数据,因此它们仍然存在局限性。例如,癌症具有不同程度的未知异质性,而生物网络包括成千上万的分子成分,在不同亚组之间往往存在差异,同时也有一些共性。在本文中,我们将高斯图形模型(GGM)分解为一系列稀疏回归问题,从而为具有未知样本异质性的多个网络提出了一种新的联合估计方法。本文引入了重参数化技术和复合 minimax 凹惩罚,有效地兼顾了多个子群网络的特殊信息和共性信息,使得所提出的估计方法明显优于现有的直接基于 GGM 正则化似然的异质性网络分析方法,并具有规模不变性、调谐不敏感性和优化凸性等特性。利用并行计算可以有效地实现所提出的分析。估算和选择的一致性得到了严格确立。所提出的方法使理论研究只关注独立网络的估计,并具有理论上和计算上都适用于大规模数据的显著优势。利用模拟数据和 TCGA 乳腺癌数据进行的大量数值实验证明了所提方法在亚组和网络识别方面的突出性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Estimation of multiple networks with common structures in heterogeneous subgroups

Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Multivariate Analysis
Journal of Multivariate Analysis 数学-统计学与概率论
CiteScore
2.40
自引率
25.00%
发文量
108
审稿时长
74 days
期刊介绍: Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data. The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of Copula modeling Functional data analysis Graphical modeling High-dimensional data analysis Image analysis Multivariate extreme-value theory Sparse modeling Spatial statistics.
期刊最新文献
Maximum likelihood estimation of elliptical tail Covariance parameter estimation of Gaussian processes with approximated functional inputs PDE-regularised spatial quantile regression Diagnostic checking of periodic vector autoregressive time series models with dependent errors A conditional distribution function-based measure for independence and K-sample tests in multivariate data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1