Parallel Least-Squares Policy Iteration

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI:10.1109/DSAA.2016.24

Jun-Kun Wang, Shou-de Lin

引用次数: 1

Abstract

Inspired by recent progress in parallel and distributed optimization, we propose parallel least-squares policy iteration (parallel LSPI) in this paper. LSPI is a policy iteration method to find an optimal policy for MDPs. As solving MDPs with large state space is challenging and time demanding, we propose a parallel variant of LSPI which is capable of leveraging multiple computational resources. Preliminary analysis of our proposed method shows that the sample complexity improved from O(1/√n) towards O(1/√Mn) for each worker, where n is the number of samples and M is the number of workers. Experiments show the advantages of parallel LSPI comparing to the standard non-parallel one.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

并行最小二乘策略迭代

受并行和分布式优化研究进展的启发，本文提出了并行最小二乘策略迭代(parallel LSPI)。LSPI是一种为mdp寻找最优策略的策略迭代方法。由于求解具有大状态空间的mdp具有挑战性和时间要求，我们提出了一种能够利用多种计算资源的LSPI并行变体。对我们提出的方法的初步分析表明，每个工人的样本复杂度从O(1/√n)提高到O(1/√Mn)，其中n为样本数量，M为工人数量。实验证明了并行LSPI与标准非并行LSPI相比的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

自引率

0.00%

发文量

期刊最新文献

A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data Task Composition in Crowdsourcing Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours Nonparametric Adjoint-Based Inference for Stochastic Differential Equations