Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms

arXiv - MATH - Algebraic Topology Pub Date : 2024-07-11 DOI:arxiv-2407.08723

Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Şimşekli

{"title":"Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms","authors":"Rayna Andreeva, Benjamin Dupuis, Rik Sarkar, Tolga Birdal, Umut Şimşekli","doi":"arxiv-2407.08723","DOIUrl":null,"url":null,"abstract":"We present a novel set of rigorous and computationally efficient\ntopology-based complexity notions that exhibit a strong correlation with the\ngeneralization gap in modern deep neural networks (DNNs). DNNs show remarkable\ngeneralization properties, yet the source of these capabilities remains\nelusive, defying the established statistical learning theory. Recent studies\nhave revealed that properties of training trajectories can be indicative of\ngeneralization. Building on this insight, state-of-the-art methods have\nleveraged the topology of these trajectories, particularly their fractal\ndimension, to quantify generalization. Most existing works compute this\nquantity by assuming continuous- or infinite-time training dynamics,\ncomplicating the development of practical estimators capable of accurately\npredicting generalization without access to test data. In this paper, we\nrespect the discrete-time nature of training trajectories and investigate the\nunderlying topological quantities that can be amenable to topological data\nanalysis tools. This leads to a new family of reliable topological complexity\nmeasures that provably bound the generalization error, eliminating the need for\nrestrictive geometric assumptions. These measures are computationally friendly,\nenabling us to propose simple yet effective algorithms for computing\ngeneralization indices. Moreover, our flexible framework can be extended to\ndifferent domains, tasks, and architectures. Our experimental results\ndemonstrate that our new complexity measures correlate highly with\ngeneralization error in industry-standards architectures such as transformers\nand deep graph networks. Our approach consistently outperforms existing\ntopological bounds across a wide range of datasets, models, and optimizers,\nhighlighting the practical relevance and effectiveness of our complexity\nmeasures.","PeriodicalId":501119,"journal":{"name":"arXiv - MATH - Algebraic Topology","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Algebraic Topology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.08723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present a novel set of rigorous and computationally efficient topology-based complexity notions that exhibit a strong correlation with the generalization gap in modern deep neural networks (DNNs). DNNs show remarkable generalization properties, yet the source of these capabilities remains elusive, defying the established statistical learning theory. Recent studies have revealed that properties of training trajectories can be indicative of generalization. Building on this insight, state-of-the-art methods have leveraged the topology of these trajectories, particularly their fractal dimension, to quantify generalization. Most existing works compute this quantity by assuming continuous- or infinite-time training dynamics, complicating the development of practical estimators capable of accurately predicting generalization without access to test data. In this paper, we respect the discrete-time nature of training trajectories and investigate the underlying topological quantities that can be amenable to topological data analysis tools. This leads to a new family of reliable topological complexity measures that provably bound the generalization error, eliminating the need for restrictive geometric assumptions. These measures are computationally friendly, enabling us to propose simple yet effective algorithms for computing generalization indices. Moreover, our flexible framework can be extended to different domains, tasks, and architectures. Our experimental results demonstrate that our new complexity measures correlate highly with generalization error in industry-standards architectures such as transformers and deep graph networks. Our approach consistently outperforms existing topological bounds across a wide range of datasets, models, and optimizers, highlighting the practical relevance and effectiveness of our complexity measures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

离散时间随机优化算法的拓扑泛化边界

我们提出了一套新颖、严谨且计算效率高的基于拓扑结构的复杂性概念，这些概念与现代深度神经网络（DNN）的泛化差距有很强的相关性。DNNs 显示出卓越的泛化特性，但这些能力的源头却仍然模糊不清，与既定的统计学习理论背道而驰。最近的研究发现，训练轨迹的特性可以指示泛化。基于这一洞察力，最先进的方法利用这些轨迹的拓扑结构，特别是其分维，来量化泛化。现有的大多数方法都是通过假设连续或无限时间的训练动态来计算这个量级的，这使得在没有测试数据的情况下开发能够准确预测泛化的实用估计器变得更加复杂。在本文中，我们尊重训练轨迹的离散时间性质，并研究可用于拓扑数据分析工具的基本拓扑量。这就产生了一系列新的可靠的拓扑复杂性度量，这些度量可以证明泛化误差的界限，而不需要严格的几何假设。这些度量易于计算，使我们能够提出简单而有效的算法来计算广义指数。此外，我们灵活的框架可以扩展到不同的领域、任务和架构。我们的实验结果表明，我们的新复杂度度量与行业标准架构（如变压器和深度图网络）中的泛化误差高度相关。在广泛的数据集、模型和优化器中，我们的方法始终优于现有的拓扑界限，这凸显了我们的复杂性度量方法的实用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Algebraic Topology

自引率

0.00%

发文量

期刊最新文献

Tensor triangular geometry of modules over the mod 2 Steenrod algebra Ring operads and symmetric bimonoidal categories Inferring hyperuniformity from local structures via persistent homology Computing the homology of universal covers via effective homology and discrete vector fields Geometric representation of cohomology classes for the Lie groups Spin(7) and Spin(8)