首页 > 最新文献

Military Operations Research最新文献

英文 中文
Sample and Computationally Efficient Stochastic Kriging in High Dimensions 高维的样本和计算效率随机克里格
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2020-10-14 DOI: 10.1287/opre.2022.2367
Liang Ding, Xiaowei Zhang
High-dimensional Simulation Metamodeling Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because the sample complexity (i.e., the number of design points required to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. To address this long-standing challenge, Liang Ding and Xiaowei Zhang, in their recent paper “Sample and Computationally Efficient Stochastic Kriging in High Dimensions”, develop a novel methodology — based on tensor Markov kernels and sparse grid experimental designs — that dramatically alleviates the curse of dimensionality. The proposed methodology has theoretical guarantees on both sample complexity and computational complexity and shows outstanding performance in numerical problems of as high as 16,675 dimensions.
高维仿真元建模随机克里格法已被广泛应用于仿真元建模,以预测复杂仿真模型的响应面。然而,它的使用仅限于设计空间是低维的情况下,因为样本复杂性(即,产生准确预测所需的设计点的数量)在设计空间的维数中呈指数增长。大样本量导致运行模拟模型的样本成本过高,并且由于需要反转大的协方差矩阵而导致严重的计算挑战。为了解决这个长期存在的挑战,丁亮和张晓伟在他们最近的论文《高维的样本和计算效率随机克里格》中,开发了一种基于张量马尔可夫核和稀疏网格实验设计的新方法,极大地缓解了维数的诅咒。该方法在样本复杂度和计算复杂度上都有理论保证,在高达16675维的数值问题上表现出优异的性能。
{"title":"Sample and Computationally Efficient Stochastic Kriging in High Dimensions","authors":"Liang Ding, Xiaowei Zhang","doi":"10.1287/opre.2022.2367","DOIUrl":"https://doi.org/10.1287/opre.2022.2367","url":null,"abstract":"High-dimensional Simulation Metamodeling Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because the sample complexity (i.e., the number of design points required to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. To address this long-standing challenge, Liang Ding and Xiaowei Zhang, in their recent paper “Sample and Computationally Efficient Stochastic Kriging in High Dimensions”, develop a novel methodology — based on tensor Markov kernels and sparse grid experimental designs — that dramatically alleviates the curse of dimensionality. The proposed methodology has theoretical guarantees on both sample complexity and computational complexity and shows outstanding performance in numerical problems of as high as 16,675 dimensions.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"85 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2020-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73445628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Online Learning Approach to Dynamic Pricing and Capacity Sizing in Service Systems 服务系统中动态定价和容量评估的在线学习方法
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2020-09-07 DOI: 10.1287/opre.2020.0612
Xinyun Chen, Yunan Liu, Guiyu Hong
Online Learning in Queueing Systems Most queueing models have no analytic solutions, so previous research often resorts to heavy-traffic analysis for performance analysis and optimization, which requires the system scale (e.g., arrival and service rate) to grow to infinity. In “An Online Learning Approach to Dynamic Pricing and Capacity Sizing in Service Systems,” X. Chen, Y. Liu, and G. Hong develop a new “scale-free” online learning framework designed for optimizing a queueing system, called gradient-based online learning in queue (GOLiQ). GOLiQ prescribes an efficient procedure to obtain improved decisions in successive cycles using newly collected queueing data (e.g., arrival counts, waiting times, and busy times). Besides its robustness in the system scale, GOLiQ is advantageous when focusing on performance optimization in the long run because its data-driven nature enables it to constantly produce improved solutions which will eventually reach optimality. Effectiveness of GOLiQ is substantiated by theoretical regret analysis (with a logarithmic regret bound) and simulation experiments.
大多数排队模型没有解析解,因此以往的研究往往采用大流量分析来进行性能分析和优化,这需要系统规模(如到达率和服务率)增长到无穷大。在“服务系统中动态定价和容量大小的在线学习方法”一文中,陈晓霞、刘毅和洪国开发了一种新的“无标度”在线学习框架,用于优化排队系统,称为基于梯度的队列在线学习(GOLiQ)。GOLiQ规定了一个有效的过程,使用新收集的排队数据(例如,到达计数、等待时间和繁忙时间)在连续的周期中获得改进的决策。除了在系统规模上的鲁棒性之外,GOLiQ在长期关注性能优化时也具有优势,因为它的数据驱动特性使其能够不断生成改进的解决方案,最终达到最优性。理论后悔分析(带有对数后悔界)和仿真实验验证了GOLiQ算法的有效性。
{"title":"An Online Learning Approach to Dynamic Pricing and Capacity Sizing in Service Systems","authors":"Xinyun Chen, Yunan Liu, Guiyu Hong","doi":"10.1287/opre.2020.0612","DOIUrl":"https://doi.org/10.1287/opre.2020.0612","url":null,"abstract":"Online Learning in Queueing Systems Most queueing models have no analytic solutions, so previous research often resorts to heavy-traffic analysis for performance analysis and optimization, which requires the system scale (e.g., arrival and service rate) to grow to infinity. In “An Online Learning Approach to Dynamic Pricing and Capacity Sizing in Service Systems,” X. Chen, Y. Liu, and G. Hong develop a new “scale-free” online learning framework designed for optimizing a queueing system, called gradient-based online learning in queue (GOLiQ). GOLiQ prescribes an efficient procedure to obtain improved decisions in successive cycles using newly collected queueing data (e.g., arrival counts, waiting times, and busy times). Besides its robustness in the system scale, GOLiQ is advantageous when focusing on performance optimization in the long run because its data-driven nature enables it to constantly produce improved solutions which will eventually reach optimality. Effectiveness of GOLiQ is substantiated by theoretical regret analysis (with a logarithmic regret bound) and simulation experiments.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"130 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77362764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Heavy-Traffic Universality of Redundancy Systems with Assignment Constraints 具有分配约束的冗余系统的大流量通用性
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2020-05-29 DOI: 10.1287/opre.2022.2385
Ellen Cardinaels, S. Borst, J. V. van Leeuwaarden
Modern service systems, like cloud computing platforms or data center environments, commonly face a high degree of heterogeneity. This heterogeneity is not only caused by different server speeds but also, by binding task-server relations that must be taken into account when assigning incoming tasks. Unfortunately, there are hardly any theoretical performance guarantees as these systems do not fall within the typical supermarket modeling framework which heavily relies on strong symmetry and homogeneity assumptions. In “Heavy-traffic universality of redundancy systems with assignment constraints,” Cardinaels, Borst, and van Leeuwaarden provide insight in the performance of these systems operating under redundancy scheduling policies. Surprisingly, when experiencing high demand, these systems exhibit state space collapse and can achieve a similar level of resource pooling and performance as a fully flexible system, even subject to quite strict task-server constraints.
现代服务系统,如云计算平台或数据中心环境,通常面临高度异构。这种异构性不仅是由不同的服务器速度引起的,而且是由在分配传入任务时必须考虑的绑定任务-服务器关系引起的。不幸的是,几乎没有任何理论上的性能保证,因为这些系统不属于典型的超市建模框架,它严重依赖于强对称性和同质性假设。在“具有分配约束的冗余系统的大流量通用性”一文中,Cardinaels、Borst和van Leeuwaarden提供了这些系统在冗余调度策略下运行的性能的见解。令人惊讶的是,当遇到高需求时,这些系统表现出状态空间崩溃,即使受到相当严格的任务服务器约束,也可以达到与完全灵活的系统相似的资源池和性能水平。
{"title":"Heavy-Traffic Universality of Redundancy Systems with Assignment Constraints","authors":"Ellen Cardinaels, S. Borst, J. V. van Leeuwaarden","doi":"10.1287/opre.2022.2385","DOIUrl":"https://doi.org/10.1287/opre.2022.2385","url":null,"abstract":"Modern service systems, like cloud computing platforms or data center environments, commonly face a high degree of heterogeneity. This heterogeneity is not only caused by different server speeds but also, by binding task-server relations that must be taken into account when assigning incoming tasks. Unfortunately, there are hardly any theoretical performance guarantees as these systems do not fall within the typical supermarket modeling framework which heavily relies on strong symmetry and homogeneity assumptions. In “Heavy-traffic universality of redundancy systems with assignment constraints,” Cardinaels, Borst, and van Leeuwaarden provide insight in the performance of these systems operating under redundancy scheduling policies. Surprisingly, when experiencing high demand, these systems exhibit state space collapse and can achieve a similar level of resource pooling and performance as a fully flexible system, even subject to quite strict task-server constraints.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"94 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83573011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scores for Multivariate Distributions and Level Sets 多变量分布和水平集的分数
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2020-02-21 DOI: 10.1287/opre.2020.0365
Xiaochun Meng, James W. Taylor, Souhaib Ben Taieb, Siran Li
Evaluating Forecasts of Multivariate Probability Distributions Forecasts of multivariate probability distributions are required for a variety of applications. The availability of a score for a forecast is important for evaluating prediction accuracy, as well as estimating model parameters. In “Scores for Multivariate Distributions and Level Sets,” X. Meng, J. W. Taylor, S. Ben Taieb, and S. Li propose a theoretical framework that encompasses several existing scores for multivariate distributions and can be used to generate new scores. In some multivariate contexts, a forecast of a level set is needed, such as a density level set for anomaly detection or the level set of the cumulative distribution, which can be used as a measure of risk. This motivates consideration of scores for level sets. The authors show that such scores can be obtained by decomposing the scores developed for multivariate distributions. A simple numerical algorithm is presented to compute the scores, and practical applications are provided in the contexts of conditional value-at-risk for financial data and the combination of expert macroeconomic forecasts.
评估多元概率分布的预测多种应用都需要多元概率分布的预测。预测得分的可用性对于评估预测准确性以及估计模型参数非常重要。在“多变量分布和水平集的分数”一文中,X. Meng、J. W. Taylor、S. Ben Taieb和S. Li提出了一个理论框架,该框架包含了多变量分布的几个现有分数,并可用于生成新分数。在一些多变量环境中,需要一个水平集的预测,例如异常检测的密度水平集或累积分布的水平集,它可以用作风险度量。这促使我们考虑关卡集的分数。作者表明,这种分数可以通过分解为多元分布开发的分数来获得。提出了一种简单的数值算法来计算分数,并在金融数据的条件风险值和专家宏观经济预测相结合的背景下提供了实际应用。
{"title":"Scores for Multivariate Distributions and Level Sets","authors":"Xiaochun Meng, James W. Taylor, Souhaib Ben Taieb, Siran Li","doi":"10.1287/opre.2020.0365","DOIUrl":"https://doi.org/10.1287/opre.2020.0365","url":null,"abstract":"Evaluating Forecasts of Multivariate Probability Distributions Forecasts of multivariate probability distributions are required for a variety of applications. The availability of a score for a forecast is important for evaluating prediction accuracy, as well as estimating model parameters. In “Scores for Multivariate Distributions and Level Sets,” X. Meng, J. W. Taylor, S. Ben Taieb, and S. Li propose a theoretical framework that encompasses several existing scores for multivariate distributions and can be used to generate new scores. In some multivariate contexts, a forecast of a level set is needed, such as a density level set for anomaly detection or the level set of the cumulative distribution, which can be used as a measure of risk. This motivates consideration of scores for level sets. The authors show that such scores can be obtained by decomposing the scores developed for multivariate distributions. A simple numerical algorithm is presented to compute the scores, and practical applications are provided in the contexts of conditional value-at-risk for financial data and the combination of expert macroeconomic forecasts.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"351 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2020-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76583630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Lagrangian Dual Decision Rules for Multistage Stochastic Mixed-Integer Programming 多阶段随机混合整数规划的拉格朗日对偶决策规则
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2020-01-03 DOI: 10.1287/opre.2022.2366
Maryam Daryalal, Merve Bodur, James R. Luedtke
On Decision Rules for Multistage Stochastic Programs with Mixed-Integer Decisions Multistage stochastic programming is a field of stochastic optimization for addressing sequential decision-making problems defined over a stochastic process with a given probability distribution. The solution to such a problem is a decision rule (policy) that maps the history of observations to the decisions. Design of the decision rules in the presence of mixed-integer decisions is quite challenging. In “Lagrangian Dual Decision Rules for Multistage Stochastic Mixed-Integer Programming,” Daryalal, Bodur, and Luedtke introduce Lagrangian dual decision rules, where linear decision rules are applied to dual multipliers associated with Lagrangian duals of a multistage stochastic mixed-integer programming (MSMIP) model. The restricted decisions are then used in the development of new primal- and dual-bounding methods. This yields a new general-purpose approximation approach for MSMIP, free of strong assumptions made in the literature, such as stagewise independence or existence of a tractable-sized scenario-tree representation.
多阶段随机规划是解决具有给定概率分布的随机过程所定义的顺序决策问题的一个随机优化领域。此类问题的解决方案是将观察历史映射到决策的决策规则(策略)。混合整数决策的决策规则设计是一项具有挑战性的工作。在“多阶段随机混合整数规划的拉格朗日对偶决策规则”一文中,Daryalal, Bodur和Luedtke介绍了拉格朗日对偶决策规则,其中线性决策规则应用于与多阶段随机混合整数规划(MSMIP)模型的拉格朗日对偶相关的对偶乘数。限制决策然后用于开发新的原边界和双边界方法。这为MSMIP提供了一种新的通用近似方法,不需要在文献中做出强假设,例如阶段独立性或存在可处理大小的场景树表示。
{"title":"Lagrangian Dual Decision Rules for Multistage Stochastic Mixed-Integer Programming","authors":"Maryam Daryalal, Merve Bodur, James R. Luedtke","doi":"10.1287/opre.2022.2366","DOIUrl":"https://doi.org/10.1287/opre.2022.2366","url":null,"abstract":"On Decision Rules for Multistage Stochastic Programs with Mixed-Integer Decisions Multistage stochastic programming is a field of stochastic optimization for addressing sequential decision-making problems defined over a stochastic process with a given probability distribution. The solution to such a problem is a decision rule (policy) that maps the history of observations to the decisions. Design of the decision rules in the presence of mixed-integer decisions is quite challenging. In “Lagrangian Dual Decision Rules for Multistage Stochastic Mixed-Integer Programming,” Daryalal, Bodur, and Luedtke introduce Lagrangian dual decision rules, where linear decision rules are applied to dual multipliers associated with Lagrangian duals of a multistage stochastic mixed-integer programming (MSMIP) model. The restricted decisions are then used in the development of new primal- and dual-bounding methods. This yields a new general-purpose approximation approach for MSMIP, free of strong assumptions made in the literature, such as stagewise independence or existence of a tractable-sized scenario-tree representation.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"48 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79364061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Graphentheorie Graphentheorie
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2019-12-10 DOI: 10.1007/978-3-662-60783-1_5
D. Briskorn
{"title":"Graphentheorie","authors":"D. Briskorn","doi":"10.1007/978-3-662-60783-1_5","DOIUrl":"https://doi.org/10.1007/978-3-662-60783-1_5","url":null,"abstract":"","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"9 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81647115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modellierung
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2019-12-10 DOI: 10.1007/978-3-662-60783-1_2
D. Briskorn
{"title":"Modellierung","authors":"D. Briskorn","doi":"10.1007/978-3-662-60783-1_2","DOIUrl":"https://doi.org/10.1007/978-3-662-60783-1_2","url":null,"abstract":"","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"17 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81929046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Einleitung
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2019-12-10 DOI: 10.1007/978-3-662-60783-1_1
D. Briskorn
{"title":"Einleitung","authors":"D. Briskorn","doi":"10.1007/978-3-662-60783-1_1","DOIUrl":"https://doi.org/10.1007/978-3-662-60783-1_1","url":null,"abstract":"","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"7 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82369661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lineare Optimierung 线性优化
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2019-12-10 DOI: 10.1007/978-3-662-60783-1_3
Dirk Briskorn
{"title":"Lineare Optimierung","authors":"Dirk Briskorn","doi":"10.1007/978-3-662-60783-1_3","DOIUrl":"https://doi.org/10.1007/978-3-662-60783-1_3","url":null,"abstract":"","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"4 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89686681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Quantification and Exploration for Reinforcement Learning 强化学习的不确定性量化与探索
IF 0.7 4区 管理学 Q3 Engineering Pub Date : 2019-10-12 DOI: 10.1287/opre.2023.2436
Yi Zhu, Jing Dong, H. Lam
Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.
在统计推断中,大样本行为和置信区间构造是评估相对于数据噪声的估计量的误差和可靠性的基础。在论文“不确定性量化和探索强化学习”中,Dong, Lam和Zhu研究了经典强化学习环境下的大样本行为。当从底层马尔可夫链收集数据时,他们推导出适当的大样本渐近分布的状态-作用值函数(q值)和最优值函数估计。这允许人们在不同的决策中评估表现的自信。严格的不确定性量化还通过最大化估计q值(均方差与方差的比值)之间的最坏情况相对差异,促进了纯勘探策略的发展。该探索策略旨在收集信息丰富的训练数据,使学习到最优奖励收集策略的概率最大化,并取得了良好的经验性能。
{"title":"Uncertainty Quantification and Exploration for Reinforcement Learning","authors":"Yi Zhu, Jing Dong, H. Lam","doi":"10.1287/opre.2023.2436","DOIUrl":"https://doi.org/10.1287/opre.2023.2436","url":null,"abstract":"Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"12 1","pages":""},"PeriodicalIF":0.7,"publicationDate":"2019-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79259161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Military Operations Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1