SIAM journal on mathematics of data science最新文献

英文中文

Time-inhomogeneous diffusion geometry and topology 时间非均匀扩散几何和拓扑

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-03-28 DOI: 10.48550/arXiv.2203.14860

G. Huguet, Alexander Tong, Bastian Alexander Rieck, Je-chun Huang, Manik Kuchroo, M. Hirn, Guy Wolf, Smita Krishnaswamy

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology as well as the ambient persistent homology of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

扩散凝聚是一个动态过程，它产生一系列旨在编码有意义抽象的多尺度数据表示。它已被证明是有效的流形学习，去噪，聚类和高维数据的可视化。扩散凝聚被构造为一个时间非均匀过程，其中每一步首先计算，然后对数据应用扩散算子。我们从几何、光谱和拓扑的角度对这一过程的收敛和演化进行了理论分析。从几何角度来看，我们基于最小转移概率和数据半径得到收敛界，而从光谱角度来看，我们的边界是基于扩散核的特征谱。我们的光谱结果特别有趣，因为大多数关于数据扩散的文献都集中在均匀过程上。从拓扑学的角度来看，我们展示了扩散凝聚推广了基于质心的分层聚类。我们使用这个透视图来获得一个基于数据点数量的边界，与它们的位置无关。为了理解数据几何超越收敛的演变，我们使用拓扑数据分析。我们证明了缩合过程本身定义了一个本征缩合同源性。我们使用这种内在拓扑以及凝聚过程的环境持续同源性来研究数据随扩散时间的变化。我们在易于理解的玩具示例中演示了两种类型的拓扑信息。我们的工作为扩散凝聚的收敛提供了理论见解，并表明它提供了拓扑和几何数据分析之间的联系。

{"title":"Time-inhomogeneous diffusion geometry and topology","authors":"G. Huguet, Alexander Tong, Bastian Alexander Rieck, Je-chun Huang, Manik Kuchroo, M. Hirn, Guy Wolf, Smita Krishnaswamy","doi":"10.48550/arXiv.2203.14860","DOIUrl":"https://doi.org/10.48550/arXiv.2203.14860","url":null,"abstract":"Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology as well as the ambient persistent homology of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"5 1","pages":"346-372"},"PeriodicalIF":0.0,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75542675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Optimality Conditions for Nonsmooth Nonconvex-Nonconcave Min-Max Problems and Generative Adversarial Networks 非光滑非凸非凹最小-最大问题的最优性条件及生成对抗网络

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-03-21 DOI: 10.1137/22m1482238

Jie Jiang, Xiaojun Chen

This paper considers a class of nonsmooth nonconvex-nonconcave min-max problems in machine learning and games. We first provide sufficient conditions for the existence of global minimax points and local minimax points. Next, we establish the first-order and second-order optimality conditions for local minimax points by using directional derivatives. These conditions reduce to smooth min-max problems with Fr{'e}chet derivatives. We apply our theoretical results to generative adversarial networks (GANs) in which two neural networks contest with each other in a game. Examples are used to illustrate applications of the new theory for training GANs.

本文研究了机器学习和博弈中的一类非光滑非凸非凹最小-最大问题。首先给出了全局极大极小点和局部极大极小点存在的充分条件。其次，利用方向导数建立了局部极大极小点的一阶和二阶最优性条件。这些条件简化为具有Fr{'e}chet导数的光滑最小-最大问题。我们将我们的理论结果应用于生成对抗网络(GANs)，其中两个神经网络在游戏中相互竞争。用实例说明了新理论在训练gan中的应用。

引用次数: 8

Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality 受控扩散过程的近似Q学习及其近最优性

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-03-14 DOI: 10.1137/22m1484201

Erhan Bayraktar, A. D. Kara

We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.

研究了连续时间随机控制问题的Q学习算法。该算法通过离散分段恒定控制过程下的状态和控制动作空间，利用采样状态过程。我们证明了该算法收敛于有限马尔可夫决策过程的最优性方程。利用该MDP模型，给出了连续时间控制问题的最优值函数的逼近误差的上界。此外，我们给出了与原问题的最优允许控制过程相比，学习控制过程性能损失的可证明上界。所提供的误差上界是时间和空间离散化参数的函数，它们揭示了不同近似水平的影响:(i)用MDP逼近连续时间控制问题，(ii)使用分段常量控制过程，(iii)空间离散化。最后，我们将所提出的算法的时间复杂度限定为时间和空间离散化参数的函数。

引用次数: 3

Accelerated Bregman Primal-Dual Methods Applied to Optimal Transport and Wasserstein Barycenter Problems 加速Bregman原对偶方法在最优输运和Wasserstein重心问题中的应用

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-03-02 DOI: 10.1137/22m1481865

A. Chambolle, Juan Pablo Contreras

This paper discusses the efficiency of Hybrid Primal-Dual (HPD) type algorithms to approximate solve discrete Optimal Transport (OT) and Wasserstein Barycenter (WB) problems, with and without entropic regularization. Our first contribution is an analysis showing that these methods yield state-of-the-art convergence rates, both theoretically and practically. Next, we extend the HPD algorithm with linesearch proposed by Malitsky and Pock in 2018 to the setting where the dual space has a Bregman divergence, and the dual function is relatively strongly convex to the Bregman's kernel. This extension yields a new method for OT and WB problems based on smoothing of the objective that also achieves state-of-the-art convergence rates. Finally, we introduce a new Bregman divergence based on a scaled entropy function that makes the algorithm numerically stable and reduces the smoothing, leading to sparse solutions of OT and WB problems. We complement our findings with numerical experiments and comparisons.

本文讨论了混合原对偶（HPD）型算法在有熵正则化和无熵正则化的情况下近似求解离散最优传输（OT）和Wasserstein Barycenter（WB）问题的效率。我们的第一个贡献是分析表明，这些方法在理论和实践上都产生了最先进的收敛速度。接下来，我们将Malitsky和Pock在2018年提出的带有linesearch的HPD算法扩展到对偶空间具有Bregman散度，并且对偶函数相对强凸到Bregman核的设置。这种扩展产生了一种基于目标平滑的OT和WB问题的新方法，该方法也实现了最先进的收敛速度。最后，我们引入了一种新的基于比例熵函数的Bregman散度，该散度使算法在数值上稳定，并减少了平滑，从而导致OT和WB问题的稀疏解。我们用数值实验和比较来补充我们的发现。

引用次数: 13

Stability of Deep Neural Networks via discrete rough paths 离散粗糙路径下深度神经网络的稳定性

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-01-19 DOI: 10.1137/22M1472358

Christian Bayer, P. Friz, N. Tapia

Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the (trained) network weights. As trained network weights are typically very rough when seen as functions of the layer, we propose to derive stability bounds in terms of the total $p$-variation of trained weights for any $pin[1,3]$. Unlike the $C^1$-theory underlying the neural ODE literature, our estimates remain bounded even in the limiting case of weights behaving like Brownian motions, as suggested in [arXiv:2105.12245]. Mathematically, we interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.

使用粗糙路径技术，我们根据输入数据和(训练的)网络权重为深度残差神经网络的输出提供先验估计。当被视为层的函数时，训练的网络权值通常非常粗糙，我们建议根据任意pin[1,3]$的总p$-训练权值的变化来推导稳定性界。与基于神经ODE文献的$C^1$-理论不同，我们的估计即使在权值表现为布朗运动的极限情况下仍然是有界的，如[arXiv:2105.12245]所建议的那样。在数学上，我们将残差神经网络解释为(粗糙)差分方程的解，并根据离散时间特征和粗糙路径理论的最新结果对其进行分析。

引用次数: 3

Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time Algorithms and Adversarial Training 两层ReLU网络的高效全局优化:二次时间算法和对抗训练

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-01-06 DOI: 10.1137/21m1467134

Yatong Bai, Tanmay Gautam, S. Sojoudi

The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multiplier (ADMM). It solves both the exact convex formulation and the approximate counterpart. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the"sampled convex programs"theory, is simpler to implement. It solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The non-convexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.

人工神经网络(ANN)训练场景的非凸性带来了固有的优化困难。虽然传统的反向传播随机梯度下降算法(SGD)及其变体在某些情况下是有效的，但它们可能会陷入虚假的局部最小值，并且对初始化和超参数敏感。最近的研究表明，使用ReLU激活的人工神经网络的训练可以重新表述为凸程序，这为全局优化可解释的人工神经网络带来了希望。然而，天真地求解凸训练公式具有指数复杂度，甚至近似启发式也需要三次时间。在这项工作中，我们描述了这种近似的质量，并开发了两种有效的算法来训练具有全局收敛保证的人工神经网络。第一种算法是基于乘法器的交替方向法(ADMM)。它既能解出精确的凸公式也能解出近似的凸公式。该方法实现了线性全局收敛，并且在初始的几次迭代中得到了具有较高预测精度的解。求解近似公式时，每次迭代的时间复杂度是二次的。第二种算法基于“抽样凸程序”理论，实现起来更简单。它解决了无约束凸公式，并收敛到一个近似全局最优分类器。当考虑对抗性训练时，人工神经网络训练景观的非凸性会加剧。我们将鲁棒凸优化理论应用于凸训练，并开发凸公式来训练神经网络对对抗输入的鲁棒性。我们的分析明确地关注于单隐藏层全连接的人工神经网络，但可以扩展到更复杂的架构。

{"title":"Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time Algorithms and Adversarial Training","authors":"Yatong Bai, Tanmay Gautam, S. Sojoudi","doi":"10.1137/21m1467134","DOIUrl":"https://doi.org/10.1137/21m1467134","url":null,"abstract":"The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of multiplier (ADMM). It solves both the exact convex formulation and the approximate counterpart. Linear global convergence is achieved, and the initial several iterations often yield a solution with high prediction accuracy. When solving the approximate formulation, the per-iteration time complexity is quadratic. The second algorithm, based on the\"sampled convex programs\"theory, is simpler to implement. It solves unconstrained convex formulations and converges to an approximately globally optimal classifier. The non-convexity of the ANN training landscape exacerbates when adversarial training is considered. We apply the robust convex optimization theory to convex training and develop convex formulations that train ANNs robust to adversarial inputs. Our analysis explicitly focuses on one-hidden-layer fully connected ANNs, but can extend to more sophisticated architectures.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64315036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Sensitivity-Informed Provable Pruning of Neural Networks 灵敏度通知的神经网络可证明剪枝

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-01-01 DOI: 10.1137/20m1383239

Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus

引用次数: 3

Algebraic Foundations for Applied Topology and Data Analysis 应用拓扑与数据分析的代数基础

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-06664-1

H. Schenck

引用次数: 1

Biwhitening Reveals the Rank of a Count Matrix. 双白化揭示了计数矩阵的秩。

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2022-01-01 DOI: 10.1137/21m1456807

Boris Landa, Thomas T C K Zhang, Yuval Kluger

Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we focus on a Poisson random matrix with independent entries and propose a simple procedure, termed biwhitening, for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling.

估计损坏数据矩阵的秩是数据分析中的一项重要任务，特别是在主成分分析中选择成分的数量。利用随机矩阵理论对大噪声矩阵的谱特性进行表征，取得了重大进展。然而，当数据矩阵由多个随机变量(例如泊松)组成时，利用这些工具并不简单，在这种情况下，噪声可能是异方差的，每个条目中都有未知的方差。在这项工作中，我们专注于具有独立条目的泊松随机矩阵，并提出了一个简单的过程，称为双白化，用于在没有任何先验知识的情况下估计底层信号矩阵(即泊松参数矩阵)的秩。我们的方法基于关键观察，即可以同时缩放数据矩阵的行和列，以便相应噪声的频谱符合标准Marchenko-Pastur (MP)定律，证明使用MP上边缘作为等级选择的阈值是合理的。重要的是，通过辛克霍恩-克诺普算法求解矩阵缩放问题，可以直接从观测中估计所需的缩放因子。除了泊松，我们的方法还扩展到满足均值和方差之间的二次关系的分布族，如广义泊松、二项式、负二项式、伽玛和许多其他分布。这种二次关系也可以解释数据中缺失的条目。我们进行了数值实验，证实了我们的理论发现，并展示了我们的方法在具有挑战性的制度中进行秩估计的优势。此外，我们证明了我们的方法在单细胞RNA测序(scRNA-seq)、高通量染色体构象捕获(Hi-C)和文档主题建模等几个真实数据集上的良好性能。

{"title":"Biwhitening Reveals the Rank of a Count Matrix.","authors":"Boris Landa, Thomas T C K Zhang, Yuval Kluger","doi":"10.1137/21m1456807","DOIUrl":"https://doi.org/10.1137/21m1456807","url":null,"abstract":"Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we focus on a Poisson random matrix with independent entries and propose a simple procedure, termed biwhitening, for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"4 4","pages":"1420-1446"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10417917/pdf/nihms-1888877.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10006236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Satisficing Paths and Independent Multiagent Reinforcement Learning in Stochastic Games 随机博弈中的满足路径与独立多智能体强化学习

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-10-09 DOI: 10.1137/22m1515112

Bora Yongacoglu, Gürdal Arslan, S. Yuksel

In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For $epsilon geq 0$, an $epsilon$-satisficing policy update rule is any rule that instructs the agent to not change its policy when it is $epsilon$-best-responding to the policies of the remaining players; $epsilon$-satisficing paths are defined to be sequences of joint policies obtained when each agent uses some $epsilon$-satisficing policy update rule to select its next policy. We establish structural results on the existence of $epsilon$-satisficing paths into $epsilon$-equilibrium in both symmetric $N$-player games and general stochastic games with two players. We then present an independent learning algorithm for $N$-player symmetric games and give high probability guarantees of convergence to $epsilon$-equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of $epsilon$-satisficing paths.

在多智能体强化学习(MARL)中，独立学习器是那些不观察系统中其他智能体行为的学习器。由于信息的分散性，设计独立的学习器来驱动游戏达到平衡是一项挑战。研究了在随机博弈中，用满足动力学方法引导独立学习者逼近均衡的可行性。对于$epsilon geq 0$，满足$epsilon$的策略更新规则是指当agent对剩余参与者的策略做出$epsilon$ -最佳响应时，指示agent不要改变其策略的规则;$epsilon$ -满意路径定义为每个agent使用某个$epsilon$ -满意策略更新规则选择下一个策略时获得的联合策略序列。我们建立了对称的$N$ -参与人对策和一般的双参与人随机对策中$epsilon$ -均衡的$epsilon$ -满足路径存在的结构性结果。然后，我们提出了一个$N$ -玩家对称博弈的独立学习算法，并给出了在自游戏下收敛到$epsilon$ -平衡的高概率保证。这种保证仅使用对称性，利用先前未开发的$epsilon$ -令人满意的路径结构。

{"title":"Satisficing Paths and Independent Multiagent Reinforcement Learning in Stochastic Games","authors":"Bora Yongacoglu, Gürdal Arslan, S. Yuksel","doi":"10.1137/22m1515112","DOIUrl":"https://doi.org/10.1137/22m1515112","url":null,"abstract":"In multi-agent reinforcement learning (MARL), independent learners are those that do not observe the actions of other agents in the system. Due to the decentralization of information, it is challenging to design independent learners that drive play to equilibrium. This paper investigates the feasibility of using satisficing dynamics to guide independent learners to approximate equilibrium in stochastic games. For $epsilon geq 0$, an $epsilon$-satisficing policy update rule is any rule that instructs the agent to not change its policy when it is $epsilon$-best-responding to the policies of the remaining players; $epsilon$-satisficing paths are defined to be sequences of joint policies obtained when each agent uses some $epsilon$-satisficing policy update rule to select its next policy. We establish structural results on the existence of $epsilon$-satisficing paths into $epsilon$-equilibrium in both symmetric $N$-player games and general stochastic games with two players. We then present an independent learning algorithm for $N$-player symmetric games and give high probability guarantees of convergence to $epsilon$-equilibrium under self-play. This guarantee is made using symmetry alone, leveraging the previously unexploited structure of $epsilon$-satisficing paths.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48563766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

SIAM journal on mathematics of data science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀