Network Reconstruction via the Minimum Description Length Principle

IF 11.6 1区 物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY Physical Review X Pub Date : 2025-03-20 DOI:10.1103/physrevx.15.011065
Tiago P. Peixoto
{"title":"Network Reconstruction via the Minimum Description Length Principle","authors":"Tiago P. Peixoto","doi":"10.1103/physrevx.15.011065","DOIUrl":null,"url":null,"abstract":"A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting and produces an inferred network with a statistically justifiable number of edges and their weight distribution. The status quo in this context is based on L</a:mi>1</a:mn></a:msub></a:math> regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity, i.e., abundance of zero weights, with weight “shrinkage.” This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster and simpler to employ, as it requires a single fit to the complete data, instead of many fits for multiple data splits and choice of regularization parameter. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of reconstructed edges and their weight distribution to be known in advance. In a series of examples, we also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving on the order of <c:math xmlns:c=\"http://www.w3.org/1998/Math/MathML\" display=\"inline\"><c:mrow><c:msup><c:mrow><c:mn>10</c:mn></c:mrow><c:mrow><c:mn>4</c:mn></c:mrow></c:msup><c:mi>–</c:mi><c:msup><c:mrow><c:mn>10</c:mn></c:mrow><c:mrow><c:mn>5</c:mn></c:mrow></c:msup></c:mrow></c:math> species and demonstrate how the inferred model can be used to predict the outcome of potential interventions and tipping points in the system. <jats:supplementary-material> <jats:copyright-statement>Published by the American Physical Society</jats:copyright-statement> <jats:copyright-year>2025</jats:copyright-year> </jats:permissions> </jats:supplementary-material>","PeriodicalId":20161,"journal":{"name":"Physical Review X","volume":"1 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review X","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/physrevx.15.011065","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting and produces an inferred network with a statistically justifiable number of edges and their weight distribution. The status quo in this context is based on L1 regularization combined with cross-validation. However, besides its high computational cost, this commonplace approach unnecessarily ties the promotion of sparsity, i.e., abundance of zero weights, with weight “shrinkage.” This combination forces a trade-off between the bias introduced by shrinkage and the network sparsity, which often results in substantial overfitting even after cross-validation. In this work, we propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization, which does not rely on weight shrinkage to promote sparsity. Our approach follows the minimum description length principle, and uncovers the weight distribution that allows for the most compression of the data, thus avoiding overfitting without requiring cross-validation. The latter property renders our approach substantially faster and simpler to employ, as it requires a single fit to the complete data, instead of many fits for multiple data splits and choice of regularization parameter. As a result, we have a principled and efficient inference scheme that can be used with a large variety of generative models, without requiring the number of reconstructed edges and their weight distribution to be known in advance. In a series of examples, we also demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks. We highlight the use of our method with the reconstruction of interaction networks between microbial communities from large-scale abundance samples involving on the order of 104105 species and demonstrate how the inferred model can be used to predict the outcome of potential interventions and tipping points in the system. Published by the American Physical Society 2025
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从动态或行为数据中重建网络的一个基本问题是,如何确定最合适的模型复杂度,以防止过度拟合,并生成具有合理统计边缘数量及其权重分布的推断网络。这方面的现状是基于 L1 正则化和交叉验证。然而,除了计算成本高之外,这种常见的方法还不必要地将促进稀疏性(即零权重的大量存在)与权重 "缩减 "联系在一起。这种组合迫使我们在权重 "收缩 "带来的偏差和网络稀疏性之间做出权衡,即使在交叉验证后,也往往会导致严重的过拟合。在这项工作中,我们提出了另一种基于分层贝叶斯推理和权重量化的非参数正则化方案,它不依赖权重收缩来促进稀疏性。我们的方法遵循最小描述长度原则,并能发现最能压缩数据的权重分布,从而避免过度拟合,且无需交叉验证。后一个特性使我们的方法更快、更简单,因为它只需要对完整数据进行一次拟合,而不是对多个数据分割和正则化参数选择进行多次拟合。因此,我们有一个原则性的高效推理方案,可用于多种生成模型,而无需事先知道重建边的数量及其权重分布。在一系列示例中,我们还证明了我们的方案能系统地提高人工网络和经验网络重建的准确性。我们重点介绍了我们的方法在重建微生物群落间相互作用网络中的应用,这些网络来自大规模丰度样本,涉及的物种数量大约为 104-105 种,我们还演示了如何利用推断出的模型来预测系统中潜在干预和临界点的结果。 美国物理学会出版 2025
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Physical Review X
Physical Review X PHYSICS, MULTIDISCIPLINARY-
CiteScore
24.60
自引率
1.60%
发文量
197
审稿时长
3 months
期刊介绍: Physical Review X (PRX) stands as an exclusively online, fully open-access journal, emphasizing innovation, quality, and enduring impact in the scientific content it disseminates. Devoted to showcasing a curated selection of papers from pure, applied, and interdisciplinary physics, PRX aims to feature work with the potential to shape current and future research while leaving a lasting and profound impact in their respective fields. Encompassing the entire spectrum of physics subject areas, PRX places a special focus on groundbreaking interdisciplinary research with broad-reaching influence.
期刊最新文献
Highly Entangled Stationary States from Strong Symmetries Topology and Nuclear Size Determine Cell Packing on Growing Lung Spheroids Strong Orbital-Lattice Coupling Induces Glassy Thermal Conductivity in High-Symmetry Single Crystal BaTiS3 Network Reconstruction via the Minimum Description Length Principle Theory of Metastable States in Many-Body Quantum Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1