Estimation of multiple networks with common structures in heterogeneous subgroups

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY Journal of Multivariate Analysis Pub Date : 2024-02-13 DOI:10.1016/j.jmva.2024.105298

Xing Qin , Jianhua Hu , Shuangge Ma , Mengyun Wu

{"title":"Estimation of multiple networks with common structures in heterogeneous subgroups","authors":"Xing Qin , Jianhua Hu , Shuangge Ma , Mengyun Wu","doi":"10.1016/j.jmva.2024.105298","DOIUrl":null,"url":null,"abstract":"<div><p>Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"202 ","pages":"Article 105298"},"PeriodicalIF":1.4000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X24000058","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

估计异质分组中具有共同结构的多个网络

网络估算一直是高维数据分析的重要组成部分，可以帮助人们了解潜在的复杂依赖结构。在现有的研究中，高斯图形模型一直很受欢迎。然而，由于存在均匀分布假设，而且只适用于小规模数据，因此它们仍然存在局限性。例如，癌症具有不同程度的未知异质性，而生物网络包括成千上万的分子成分，在不同亚组之间往往存在差异，同时也有一些共性。在本文中，我们将高斯图形模型（GGM）分解为一系列稀疏回归问题，从而为具有未知样本异质性的多个网络提出了一种新的联合估计方法。本文引入了重参数化技术和复合 minimax 凹惩罚，有效地兼顾了多个子群网络的特殊信息和共性信息，使得所提出的估计方法明显优于现有的直接基于 GGM 正则化似然的异质性网络分析方法，并具有规模不变性、调谐不敏感性和优化凸性等特性。利用并行计算可以有效地实现所提出的分析。估算和选择的一致性得到了严格确立。所提出的方法使理论研究只关注独立网络的估计，并具有理论上和计算上都适用于大规模数据的显著优势。利用模拟数据和 TCGA 乳腺癌数据进行的大量数值实验证明了所提方法在亚组和网络识别方面的突出性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Multivariate Analysis 数学-统计学与概率论

CiteScore

2.40

自引率

25.00%

发文量

108

审稿时长

74 days

期刊介绍： Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data. The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of Copula modeling Functional data analysis Graphical modeling High-dimensional data analysis Image analysis Multivariate extreme-value theory Sparse modeling Spatial statistics.