Conference on Learning for Dynamics & Control最新文献

英文中文

Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version 基于轨迹预测与跟踪的在线LQR控制的后悔分析:扩展版

Conference on Learning for Dynamics & Control

Pub Date : 2023-02-21 DOI: 10.48550/arXiv.2302.10411

Yitian Chen, Timothy L. Molloy, T. Summers, I. Shames

In this paper, we propose and analyze a new method for online linear quadratic regulator (LQR) control with a priori unknown time-varying cost matrices. The cost matrices are revealed sequentially with the potential for future values to be previewed over a short window. Our novel method involves using the available cost matrices to predict the optimal trajectory, and a tracking controller to drive the system towards it. We adopted the notion of dynamic regret to measure the performance of this proposed online LQR control method, with our main result being that the (dynamic) regret of our method is upper bounded by a constant. Moreover, the regret upper bound decays exponentially with the preview window length, and is extendable to systems with disturbances. We show in simulations that our proposed method offers improved performance compared to other previously proposed online LQR methods.

本文提出并分析了一种具有先验未知时变代价矩阵的在线线性二次型调节器(LQR)控制新方法。成本矩阵按顺序显示，并在短窗口内预览未来值的可能性。我们的新方法包括使用可用的成本矩阵来预测最优轨迹，并使用跟踪控制器来驱动系统向其移动。我们采用动态后悔的概念来衡量所提出的在线LQR控制方法的性能，我们的主要结果是我们的方法的(动态)后悔的上限是一个常数。此外，遗憾上界随预览窗口长度呈指数衰减，并可推广到有扰动的系统。我们在模拟中表明，与其他先前提出的在线LQR方法相比，我们提出的方法提供了更好的性能。

引用次数: 1

Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs 指数成本风险敏感mdp的改进策略迭代

Conference on Learning for Dynamics & Control

Pub Date : 2023-02-08 DOI: 10.48550/arXiv.2302.03811

Yashaswini Murthy, Mehrdad Moharrami, R. Srikant

Modified policy iteration (MPI) also known as optimistic policy iteration is at the core of many reinforcement learning algorithms. It works by combining elements of policy iteration and value iteration. The convergence of MPI has been well studied in the case of discounted and average-cost MDPs. In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, modified policy iteration is relatively unexplored. We provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems. The proof of approximate modified policy iteration for risk sensitive MDPs is also provided in the appendix.

修正策略迭代(MPI)也称为乐观策略迭代，是许多强化学习算法的核心。它通过结合策略迭代和值迭代的元素来工作。MPI的收敛性已经在折现和平均成本mpp的情况下得到了很好的研究。在这项工作中，我们考虑了指数成本风险敏感的MDP公式，已知它对模型参数具有一定的鲁棒性。虽然政策迭代和价值迭代在风险敏感型mdp的背景下已经得到了很好的研究，但对修正策略迭代的研究相对较少。我们首次证明了在有限状态和有限作用空间下，MPI对于风险敏感问题也是收敛的。由于指数成本公式处理乘法Bellman方程，我们的主要贡献是收敛证明，这与贴现和风险中性平均成本问题的现有结果有很大不同。附录中还提供了对风险敏感的mdp近似修改策略迭代的证明。

引用次数: 0

Certified Invertibility in Neural Networks via Mixed-Integer Programming 基于混合整数规划的神经网络的证明可逆性

Conference on Learning for Dynamics & Control

Pub Date : 2023-01-27 DOI: 10.48550/arXiv.2301.11783

Tianqi Cui, Tom S. Bertalan, George J. Pappas, M. Morari, I. Kevrekidis, Mahyar Fazlyab

Neural networks are known to be vulnerable to adversarial attacks, which are small, imperceptible perturbations that can significantly alter the network's output. Conversely, there may exist large, meaningful perturbations that do not affect the network's decision (excessive invariance). In our research, we investigate this latter phenomenon in two contexts: (a) discrete-time dynamical system identification, and (b) the calibration of a neural network's output to that of another network. We examine noninvertibility through the lens of mathematical optimization, where the global solution measures the ``safety"of the network predictions by their distance from the non-invertibility boundary. We formulate mixed-integer programs (MIPs) for ReLU networks and $L_p$ norms ($p=1,2,infty$) that apply to neural network approximators of dynamical systems. We also discuss how our findings can be useful for invertibility certification in transformations between neural networks, e.g. between different levels of network pruning.

众所周知，神经网络很容易受到对抗性攻击，这种攻击是微小的、难以察觉的扰动，可以显著改变网络的输出。相反，可能存在不影响网络决策的大而有意义的扰动(过度不变性)。在我们的研究中，我们在两种情况下研究了后一种现象:(a)离散时间动力系统识别，以及(b)神经网络输出与另一个网络输出的校准。我们通过数学优化的镜头来检查不可逆性，其中全局解通过它们与不可逆性边界的距离来测量网络预测的“安全性”。我们为ReLU网络和$L_p$规范($p=1,2,infty$)制定了混合整数程序(MIPs)，适用于动态系统的神经网络逼近器。我们还讨论了我们的发现如何对神经网络之间转换的可逆性证明有用，例如在不同级别的网络修剪之间。

引用次数: 0

Online switching control with stability and regret guarantees 在线切换控制，具有稳定性和遗憾保证

Conference on Learning for Dynamics & Control

Pub Date : 2023-01-20 DOI: 10.48550/arXiv.2301.08445

Yingying Li, James A. Preiss, Na Li, Yiheng Lin, A. Wierman, J. Shamma

This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stability throughout the duration of its execution. We also provide a sublinear policy regret guarantee compared with the optimal stabilizing candidate controller. Lastly, we numerically test our algorithm on quadrotor planar flights and compare it with a classical switching control algorithm, falsification-based switching, and a classical multi-armed bandit algorithm, Exp3 with batches.

研究具有有限候选控制器池、未知动力系统和未知代价函数的在线切换控制问题。候选控制器可以是不稳定策略。我们只需要至少一个候选控制器来满足一定的稳定性，但我们不知道哪一个是稳定的。我们设计了一种在线算法，在整个执行过程中保证有限增益的稳定性。与最优稳定候选控制器相比，我们还提供了次线性策略遗憾保证。最后，我们在四旋翼平面飞行中对该算法进行了数值测试，并将其与经典的切换控制算法、基于伪造的切换算法和经典的多臂强盗算法Exp3进行了比较。

引用次数: 4

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? 直接潜模型学习能解决线性二次高斯控制吗?

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-30 DOI: 10.48550/arXiv.2212.14511

Yi Tian, K. Zhang, Russ Tedrake, S. Sra

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.

我们研究从潜在的高维观测中学习状态表示的任务，目标是控制未知的部分可观察系统。我们采用直接潜在模型学习方法，通过预测与规划(例如成本)直接相关的数量来学习某些潜在状态空间中的动态模型，而无需重建观测值。我们特别关注一种直观的成本驱动状态表示学习方法，用于解决线性二次高斯(LQG)控制，这是最基本的部分可观察控制问题之一。作为我们的主要结果，我们使用直接学习的潜在模型建立了寻找近最优状态表示函数和近最优控制器的有限样本保证。据我们所知，尽管有各种各样的经验成功，但在这项工作之前，尚不清楚这种成本驱动的潜在模型学习者是否享有有限样本保证。我们的工作强调了预测多步成本的价值，这是我们理论的关键思想，也是一个已知的对学习状态表示具有经验价值的思想。

{"title":"Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?","authors":"Yi Tian, K. Zhang, Russ Tedrake, S. Sra","doi":"10.48550/arXiv.2212.14511","DOIUrl":"https://doi.org/10.48550/arXiv.2212.14511","url":null,"abstract":"We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130463164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Data-driven Stochastic Output-Feedback Predictive Control: Recursive Feasibility through Interpolated Initial Conditions 数据驱动的随机输出反馈预测控制:通过插值初始条件的递归可行性

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.07661

Guanru Pan, Ruchuan Ou, T. Faulwasser

The paper investigates data-driven output-feedback predictive control of linear systems subject to stochastic disturbances. The scheme relies on the recursive solution of a suitable data-driven reformulation of a stochastic Optimal Control Problem (OCP), which allows for forward prediction and optimization of statistical distributions of inputs and outputs. Our approach avoids the use of parametric system models. Instead it is based on previously recorded data using a recently proposed stochastic variant of Willems' fundamental lemma. The stochastic variant of the lemma is applicable to a large class of linear dynamics subject to stochastic disturbances of Gaussian and non-Gaussian nature. To ensure recursive feasibility, the initial condition of the OCP -- which consists of information about past inputs and outputs -- is considered as an extra decision variable of the OCP. We provide sufficient conditions for recursive feasibility and closed-loop practical stability of the proposed scheme as well as performance bounds. Finally, a numerical example illustrates the efficacy and closed-loop properties of the proposed scheme.

研究了随机扰动下线性系统的数据驱动输出反馈预测控制问题。该方案依赖于一个合适的数据驱动的随机最优控制问题(OCP)的重新表述的递归解决方案，它允许输入和输出的统计分布的前向预测和优化。我们的方法避免使用参数化系统模型。相反，它是基于先前记录的数据，使用最近提出的Willems基本引理的随机变体。引理的随机变体适用于受高斯和非高斯性质的随机扰动的大类线性动力学。为了确保递归的可行性，将OCP的初始条件(由过去的输入和输出信息组成)作为OCP的额外决策变量。给出了该方案的递归可行性和闭环实际稳定性的充分条件，并给出了性能界。最后，通过一个算例说明了该方法的有效性和闭环特性。

{"title":"Data-driven Stochastic Output-Feedback Predictive Control: Recursive Feasibility through Interpolated Initial Conditions","authors":"Guanru Pan, Ruchuan Ou, T. Faulwasser","doi":"10.48550/arXiv.2212.07661","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07661","url":null,"abstract":"The paper investigates data-driven output-feedback predictive control of linear systems subject to stochastic disturbances. The scheme relies on the recursive solution of a suitable data-driven reformulation of a stochastic Optimal Control Problem (OCP), which allows for forward prediction and optimization of statistical distributions of inputs and outputs. Our approach avoids the use of parametric system models. Instead it is based on previously recorded data using a recently proposed stochastic variant of Willems' fundamental lemma. The stochastic variant of the lemma is applicable to a large class of linear dynamics subject to stochastic disturbances of Gaussian and non-Gaussian nature. To ensure recursive feasibility, the initial condition of the OCP -- which consists of information about past inputs and outputs -- is considered as an extra decision variable of the OCP. We provide sufficient conditions for recursive feasibility and closed-loop practical stability of the proposed scheme as well as performance bounds. Finally, a numerical example illustrates the efficacy and closed-loop properties of the proposed scheme.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122146886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems 基于混合多智能体的随需移动系统深度强化学习

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.07313

Tobias Enders, James Harrison, M. Pavone, Maximilian Schiffer

We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.

考虑了利润最大化的自动随需移动系统的主动请求分配和拒绝决策的顺序决策问题。我们将该问题形式化为一个马尔可夫决策过程，并提出了一种新的多智能体软行为者-批评家和加权二部匹配的组合来获得一个预期的控制策略。因此，我们分解了算子的难以处理的动作空间，但仍然得到一个全局协调的决策。基于真实出租车数据的实验表明，我们的方法在性能、稳定性和计算可追溯性方面优于最先进的基准测试。

引用次数: 5

Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data 在线学习干扰风险感知控制:风险感知飞行少于一分钟的数据

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06253

Prithvi Akella, Skylar X. Wei, J. Burdick, A. Ames

Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.

安全关键风险感知控制的最新进展是基于对系统可能面临的干扰的先验知识。本文提出了一种在风险意识环境下在线有效学习这些干扰的方法。首先，我们引入了风险表面(Surface-at-Risk)的概念，这是一种随机过程的风险度量，扩展了风险价值(Value-at-Risk)——一种风险意识控制领域常用的风险度量。其次，我们将模型与真实系统演化状态差异的范数建模为一个标量值随机过程，并通过高斯过程回归确定其风险表面的上界。第三，我们提供了关于拟合表面准确性的理论结果，这些结果受温和假设的影响，这些假设与系统运行期间收集的数据集有关。最后，我们通过实验验证了我们的程序，增强了无人机的控制器，并在收集不到一分钟的操作数据后，通过我们的风险意识方法突出了性能的提高。

引用次数: 2

Targeted Adversarial Attacks against Neural Network Trajectory Predictors 针对神经网络轨迹预测器的针对性对抗性攻击

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04138

Kai Liang Tan, J. Wang, Y. Kantaros

Trajectory prediction is an integral component of modern autonomous systems as it allows for envisioning future intentions of nearby moving agents. Due to the lack of other agents' dynamics and control policies, deep neural network (DNN) models are often employed for trajectory forecasting tasks. Although there exists an extensive literature on improving the accuracy of these models, there is a very limited number of works studying their robustness against adversarially crafted input trajectories. To bridge this gap, in this paper, we propose a targeted adversarial attack against DNN models for trajectory forecasting tasks. We call the proposed attack TA4TP for Targeted adversarial Attack for Trajectory Prediction. Our approach generates adversarial input trajectories that are capable of fooling DNN models into predicting user-specified target/desired trajectories. Our attack relies on solving a nonlinear constrained optimization problem where the objective function captures the deviation of the predicted trajectory from a target one while the constraints model physical requirements that the adversarial input should satisfy. The latter ensures that the inputs look natural and they are safe to execute (e.g., they are close to nominal inputs and away from obstacles). We demonstrate the effectiveness of TA4TP on two state-of-the-art DNN models and two datasets. To the best of our knowledge, we propose the first targeted adversarial attack against DNN models used for trajectory forecasting.

轨迹预测是现代自主系统的一个组成部分，因为它允许设想附近移动代理的未来意图。由于缺乏其他智能体的动力学和控制策略，深度神经网络(DNN)模型经常用于轨迹预测任务。尽管存在大量关于提高这些模型准确性的文献，但研究它们对对抗性输入轨迹的鲁棒性的工作数量非常有限。为了弥补这一差距，在本文中，我们提出了一种针对DNN模型的针对性对抗性攻击，用于轨迹预测任务。我们将提出的攻击称为TA4TP (Targeted对抗性攻击for Trajectory Prediction)。我们的方法生成对抗性输入轨迹，能够欺骗DNN模型预测用户指定的目标/期望轨迹。我们的攻击依赖于解决一个非线性约束优化问题，其中目标函数捕获预测轨迹与目标轨迹的偏差，而约束模型是对抗输入应满足的物理要求。后者确保输入看起来自然，并且可以安全执行(例如，它们接近标称输入并远离障碍物)。我们在两个最先进的DNN模型和两个数据集上展示了TA4TP的有效性。据我们所知，我们提出了第一个针对用于轨迹预测的DNN模型的针对性对抗性攻击。

{"title":"Targeted Adversarial Attacks against Neural Network Trajectory Predictors","authors":"Kai Liang Tan, J. Wang, Y. Kantaros","doi":"10.48550/arXiv.2212.04138","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04138","url":null,"abstract":"Trajectory prediction is an integral component of modern autonomous systems as it allows for envisioning future intentions of nearby moving agents. Due to the lack of other agents' dynamics and control policies, deep neural network (DNN) models are often employed for trajectory forecasting tasks. Although there exists an extensive literature on improving the accuracy of these models, there is a very limited number of works studying their robustness against adversarially crafted input trajectories. To bridge this gap, in this paper, we propose a targeted adversarial attack against DNN models for trajectory forecasting tasks. We call the proposed attack TA4TP for Targeted adversarial Attack for Trajectory Prediction. Our approach generates adversarial input trajectories that are capable of fooling DNN models into predicting user-specified target/desired trajectories. Our attack relies on solving a nonlinear constrained optimization problem where the objective function captures the deviation of the predicted trajectory from a target one while the constraints model physical requirements that the adversarial input should satisfy. The latter ensures that the inputs look natural and they are safe to execute (e.g., they are close to nominal inputs and away from obstacles). We demonstrate the effectiveness of TA4TP on two state-of-the-art DNN models and two datasets. To the best of our knowledge, we propose the first targeted adversarial attack against DNN models used for trajectory forecasting.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115571068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic Approach 随机动力系统的集中现象:一种算子理论方法

Conference on Learning for Dynamics & Control

Pub Date : 2022-12-07 DOI: 10.48550/arXiv.2212.03670

Muhammad Naeem

Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$mu_{pi}$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $|e^{r}|_{q rightarrow 2} propto expbig(mu_{pi}(r) +frac{2q}{q-2}big) $ and $P$ is hyperbounded with norm control $|P|_{2 rightarrow q }2$. The role of emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.

通过算符理论方法，我们形式化了离散时间马尔可夫链的给定可观测值$r$的集中现象，其中$mu_{pi}$为不变遍历测度，可能在无界状态空间上有支持。本文的主要贡献是通过研究马尔可夫转移算子$P$和由$e^{r}$定义的乘法算子的组合来避免繁琐的概率方法。事实证明，即使可观察/奖励函数是无界的，但对于一些$q>2$, $|e^{r}|_{q rightarrow 2} propto expbig(mu_{pi}(r) +frac{2q}{q-2}big) $和$P$是超界的规范控制$|P|_{2 rightarrow q }2$。揭示了浓度现象中emph{可逆性}的作用。这些结果对于强化学习和控制社区特别有用，因为它们允许集中不等式与标准无界可观察值/奖励函数在系统的确切知识不可用，更不用说可逆的平稳措施。

{"title":"Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic Approach","authors":"Muhammad Naeem","doi":"10.48550/arXiv.2212.03670","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03670","url":null,"abstract":"Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$mu_{pi}$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $|e^{r}|_{q rightarrow 2} propto expbig(mu_{pi}(r) +frac{2q}{q-2}big) $ and $P$ is hyperbounded with norm control $|P|_{2 rightarrow q }<e^{frac{1}{2}[frac{1}{2}-frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Conference on Learning for Dynamics & Control

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀