Network Reconstruction via the Minimum Description Length Principle

IF 11.6 1区物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY Physical Review X Pub Date : 2025-03-20 DOI:10.1103/physrevx.15.011065

Tiago P. Peixoto

{"title":"Network Reconstruction via the Minimum Description Length Principle","authors":"Tiago P. Peixoto","doi":"10.1103/physrevx.15.011065","DOIUrl":null,"url":null,"abstract":"A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting and produces an inferred network with a statistically justifiable number of edges and their weight distribution. The status quo in this context is based on L</a:mi>1</a:mn></a:msub></a:math> regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity, i.e., abundance of zero weights, with weight “shrinkage.” This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster and simpler to employ, as it requires a single fit to the complete data, instead of many fits for multiple data splits and choice of regularization parameter. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of reconstructed edges and their weight distribution to be known in advance. In a series of examples, we also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving on the order of <c:math xmlns:c=\"http://www.w3.org/1998/Math/MathML\" display=\"inline\"><c:mrow><c:msup><c:mrow><c:mn>10</c:mn></c:mrow><c:mrow><c:mn>4</c:mn></c:mrow></c:msup><c:mi>–</c:mi><c:msup><c:mrow><c:mn>10</c:mn></c:mrow><c:mrow><c:mn>5</c:mn></c:mrow></c:msup></c:mrow></c:math> species and demonstrate how the inferred model can be used to predict the outcome of potential interventions and tipping points in the system. Published by the American Physical Society 2025 ","PeriodicalId":20161,"journal":{"name":"Physical Review X","volume":"1 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review X","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/physrevx.15.011065","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting and produces an inferred network with a statistically justifiable number of edges and their weight distribution. The status quo in this context is based on L1 regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity, i.e., abundance of zero weights, with weight “shrinkage.” This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster and simpler to employ, as it requires a single fit to the complete data, instead of many fits for multiple data splits and choice of regularization parameter. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of reconstructed edges and their weight distribution to be known in advance. In a series of examples, we also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving on the order of 104–105 species and demonstrate how the inferred model can be used to predict the outcome of potential interventions and tipping points in the system.

Published by the American Physical Society

2025

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从动态或行为数据中重建网络的一个基本问题是，如何确定最合适的模型复杂度，以防止过度拟合，并生成具有合理统计边缘数量及其权重分布的推断网络。这方面的现状是基于 L1 正则化和交叉验证。然而，除了计算成本高之外，这种常见的方法还不必要地将促进稀疏性（即零权重的大量存在）与权重 "缩减 "联系在一起。这种组合迫使我们在权重 "收缩 "带来的偏差和网络稀疏性之间做出权衡，即使在交叉验证后，也往往会导致严重的过拟合。在这项工作中，我们提出了另一种基于分层贝叶斯推理和权重量化的非参数正则化方案，它不依赖权重收缩来促进稀疏性。我们的方法遵循最小描述长度原则，并能发现最能压缩数据的权重分布，从而避免过度拟合，且无需交叉验证。后一个特性使我们的方法更快、更简单，因为它只需要对完整数据进行一次拟合，而不是对多个数据分割和正则化参数选择进行多次拟合。因此，我们有一个原则性的高效推理方案，可用于多种生成模型，而无需事先知道重建边的数量及其权重分布。在一系列示例中，我们还证明了我们的方案能系统地提高人工网络和经验网络重建的准确性。我们重点介绍了我们的方法在重建微生物群落间相互作用网络中的应用，这些网络来自大规模丰度样本，涉及的物种数量大约为 104-105 种，我们还演示了如何利用推断出的模型来预测系统中潜在干预和临界点的结果。美国物理学会出版 2025

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Physical Review X PHYSICS, MULTIDISCIPLINARY-

CiteScore

24.60

自引率

1.60%

发文量

197

审稿时长

3 months

期刊介绍： Physical Review X (PRX) stands as an exclusively online, fully open-access journal, emphasizing innovation, quality, and enduring impact in the scientific content it disseminates. Devoted to showcasing a curated selection of papers from pure, applied, and interdisciplinary physics, PRX aims to feature work with the potential to shape current and future research while leaving a lasting and profound impact in their respective fields. Encompassing the entire spectrum of physics subject areas, PRX places a special focus on groundbreaking interdisciplinary research with broad-reaching influence.

期刊最新文献

Highly Entangled Stationary States from Strong Symmetries Topology and Nuclear Size Determine Cell Packing on Growing Lung Spheroids Strong Orbital-Lattice Coupling Induces Glassy Thermal Conductivity in High-Symmetry Single Crystal BaTiS3 Network Reconstruction via the Minimum Description Length Principle Theory of Metastable States in Many-Body Quantum Systems