Pub Date : 2011-12-01Epub Date: 2011-10-17DOI: 10.1109/TNN.2011.2170094
Zhirong Yang, Erkki Oja
Multiplicative updates have been widely used in approximative nonnegative matrix factorization (NMF) optimization because they are convenient to deploy. Their convergence proof is usually based on the minimization of an auxiliary upper-bounding function, the construction of which however remains specific and only available for limited types of dissimilarity measures. Here we make significant progress in developing convergent multiplicative algorithms for NMF. First, we propose a general approach to derive the auxiliary function for a wide variety of NMF problems, as long as the approximation objective can be expressed as a finite sum of monomials with real exponents. Multiplicative algorithms with theoretical guarantee of monotonically decreasing objective function sequence can thus be obtained. The solutions of NMF based on most commonly used dissimilarity measures such as α- and β-divergence as well as many other more comprehensive divergences can be derived by the new unified principle. Second, our method is extended to a nonseparable case that includes e.g., γ-divergence and Rényi divergence. Third, we develop multiplicative algorithms for NMF using second-order approximative factorizations, in which each factorizing matrix may appear twice. Preliminary numerical experiments demonstrate that the multiplicative algorithms developed using the proposed procedure can achieve satisfactory Karush-Kuhn-Tucker optimality. We also demonstrate NMF problems where algorithms by the conventional method fail to guarantee descent at each iteration but those by our principle are immune to such violation.
{"title":"Unified development of multiplicative algorithms for linear and quadratic nonnegative matrix factorization.","authors":"Zhirong Yang, Erkki Oja","doi":"10.1109/TNN.2011.2170094","DOIUrl":"https://doi.org/10.1109/TNN.2011.2170094","url":null,"abstract":"<p><p>Multiplicative updates have been widely used in approximative nonnegative matrix factorization (NMF) optimization because they are convenient to deploy. Their convergence proof is usually based on the minimization of an auxiliary upper-bounding function, the construction of which however remains specific and only available for limited types of dissimilarity measures. Here we make significant progress in developing convergent multiplicative algorithms for NMF. First, we propose a general approach to derive the auxiliary function for a wide variety of NMF problems, as long as the approximation objective can be expressed as a finite sum of monomials with real exponents. Multiplicative algorithms with theoretical guarantee of monotonically decreasing objective function sequence can thus be obtained. The solutions of NMF based on most commonly used dissimilarity measures such as α- and β-divergence as well as many other more comprehensive divergences can be derived by the new unified principle. Second, our method is extended to a nonseparable case that includes e.g., γ-divergence and Rényi divergence. Third, we develop multiplicative algorithms for NMF using second-order approximative factorizations, in which each factorizing matrix may appear twice. Preliminary numerical experiments demonstrate that the multiplicative algorithms developed using the proposed procedure can achieve satisfactory Karush-Kuhn-Tucker optimality. We also demonstrate NMF problems where algorithms by the conventional method fail to guarantee descent at each iteration but those by our principle are immune to such violation.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"1878-91"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2170094","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30072528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-11-30DOI: 10.1109/TNN.2011.2175946
Chow Yin Lai, Cheng Xiang, Tong Heng Lee
The piecewise affine (PWA) model represents an attractive model structure for approximating nonlinear systems. In this paper, a procedure for obtaining the PWA autoregressive exogenous (ARX) (autoregressive systems with exogenous inputs) models of nonlinear systems is proposed. Two key parameters defining a PWARX model, namely, the parameters of locally affine subsystems and the partition of the regressor space, are estimated, the former through a least-squares-based identification method using multiple models, and the latter using standard procedures such as neural network classifier or support vector machine classifier. Having obtained the PWARX model of the nonlinear system, a controller is then derived to control the system for reference tracking. Both simulation and experimental studies show that the proposed algorithm can indeed provide accurate PWA approximation of nonlinear systems, and the designed controller provides good tracking performance.
{"title":"Data-based identification and control of nonlinear systems via piecewise affine approximation.","authors":"Chow Yin Lai, Cheng Xiang, Tong Heng Lee","doi":"10.1109/TNN.2011.2175946","DOIUrl":"https://doi.org/10.1109/TNN.2011.2175946","url":null,"abstract":"<p><p>The piecewise affine (PWA) model represents an attractive model structure for approximating nonlinear systems. In this paper, a procedure for obtaining the PWA autoregressive exogenous (ARX) (autoregressive systems with exogenous inputs) models of nonlinear systems is proposed. Two key parameters defining a PWARX model, namely, the parameters of locally affine subsystems and the partition of the regressor space, are estimated, the former through a least-squares-based identification method using multiple models, and the latter using standard procedures such as neural network classifier or support vector machine classifier. Having obtained the PWARX model of the nonlinear system, a controller is then derived to control the system for reference tracking. Both simulation and experimental studies show that the proposed algorithm can indeed provide accurate PWA approximation of nonlinear systems, and the designed controller provides good tracking performance.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"2189-200"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2175946","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30307609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, synchronization of continuous dynamical networks with discrete-time communications is studied. Though the dynamical behavior of each node is continuous-time, the communications between every two different nodes are discrete-time, i.e., they are active only at some discrete time instants. Moreover, the communication intervals between every two communication instants can be uncertain and variable. By choosing a piecewise Lyapunov-Krasovskii functional to govern the characteristics of the discrete communication instants and by utilizing a convex combination technique, a synchronization criterion is derived in terms of linear matrix inequalities with an upper bound for the communication intervals obtained. The results extend and improve upon earlier work. Simulation results show the effectiveness of the proposed communication scheme. Some relationships between the allowable upper bound of communication intervals and the coupling strength of the network are illustrated through simulations on a fully connected network, a star-like network, and a nearest neighbor network.
{"title":"Synchronization of continuous dynamical networks with discrete-time communications.","authors":"Yan-Wu Wang, Jiang-Wen Xiao, Changyun Wen, Zhi-Hong Guan","doi":"10.1109/TNN.2011.2171501","DOIUrl":"https://doi.org/10.1109/TNN.2011.2171501","url":null,"abstract":"<p><p>In this paper, synchronization of continuous dynamical networks with discrete-time communications is studied. Though the dynamical behavior of each node is continuous-time, the communications between every two different nodes are discrete-time, i.e., they are active only at some discrete time instants. Moreover, the communication intervals between every two communication instants can be uncertain and variable. By choosing a piecewise Lyapunov-Krasovskii functional to govern the characteristics of the discrete communication instants and by utilizing a convex combination technique, a synchronization criterion is derived in terms of linear matrix inequalities with an upper bound for the communication intervals obtained. The results extend and improve upon earlier work. Simulation results show the effectiveness of the proposed communication scheme. Some relationships between the allowable upper bound of communication intervals and the coupling strength of the network are illustrated through simulations on a fully connected network, a star-like network, and a nearest neighbor network.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":" ","pages":"1979-86"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2171501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40117954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-31DOI: 10.1109/TNN.2011.2172627
Jie Cheng, Mohammad R Sayeh, Mehdi R Zargham, Qiang Cheng
This brief presents a dynamical system approach to vector quantization or clustering based on ordinary differential equations with the potential for real-time implementation. Two examples of different pattern clusters demonstrate that the model can successfully quantize different types of input patterns. Furthermore, we analyze and study the stability of our dynamical system. By discovering the equilibrium points for certain input patterns and analyzing their stability, we have shown the quantizing behavior of the system with respect to its vigilance parameter. The proposed system is applied to two real-world problems, providing comparable results to the best reported findings. This validates the effectiveness of our proposed approach.
{"title":"Real-time vector quantization and clustering based on ordinary differential equations.","authors":"Jie Cheng, Mohammad R Sayeh, Mehdi R Zargham, Qiang Cheng","doi":"10.1109/TNN.2011.2172627","DOIUrl":"https://doi.org/10.1109/TNN.2011.2172627","url":null,"abstract":"<p><p>This brief presents a dynamical system approach to vector quantization or clustering based on ordinary differential equations with the potential for real-time implementation. Two examples of different pattern clusters demonstrate that the model can successfully quantize different types of input patterns. Furthermore, we analyze and study the stability of our dynamical system. By discovering the equilibrium points for certain input patterns and analyzing their stability, we have shown the quantizing behavior of the system with respect to its vigilance parameter. The proposed system is applied to two real-world problems, providing comparable results to the best reported findings. This validates the effectiveness of our proposed approach.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":" ","pages":"2143-8"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2172627","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40131704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-31DOI: 10.1109/TNN.2011.2169682
Zhishan Guo, Qingshan Liu, Jun Wang
In this paper, a one-layer recurrent neural network is presented for solving pseudoconvex optimization problems subject to linear equality constraints. The global convergence of the neural network can be guaranteed even though the objective function is pseudoconvex. The finite-time state convergence to the feasible region defined by the equality constraints is also proved. In addition, global exponential convergence is proved when the objective function is strongly pseudoconvex on the feasible region. Simulation results on illustrative examples and application on chemical process data reconciliation are provided to demonstrate the effectiveness and characteristics of the neural network.
{"title":"A one-layer recurrent neural network for pseudoconvex optimization subject to linear equality constraints.","authors":"Zhishan Guo, Qingshan Liu, Jun Wang","doi":"10.1109/TNN.2011.2169682","DOIUrl":"https://doi.org/10.1109/TNN.2011.2169682","url":null,"abstract":"<p><p>In this paper, a one-layer recurrent neural network is presented for solving pseudoconvex optimization problems subject to linear equality constraints. The global convergence of the neural network can be guaranteed even though the objective function is pseudoconvex. The finite-time state convergence to the feasible region defined by the equality constraints is also proved. In addition, global exponential convergence is proved when the objective function is strongly pseudoconvex on the feasible region. Simulation results on illustrative examples and application on chemical process data reconciliation are provided to demonstrate the effectiveness and characteristics of the neural network.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":" ","pages":"1892-900"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2169682","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40132306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-20DOI: 10.1109/TNN.2011.2170220
Yu Su, Shiguang Shan, Xilin Chen, Wen Gao
Fisher's linear discriminant (FLD) is one of the most widely used linear feature extraction method, especially in many visual computation tasks. Based on the analysis on several limitations of the traditional FLD, this paper attempts to propose a new computational paradigm for discriminative linear feature extraction, named "classifiability-based discriminatory projection pursuit" (CDPP), which is different from the traditional FLD and its variants. There are two steps in the proposed CDPP: one is the construction of a candidate projection set (CPS), and the other is the pursuit of discriminatory projections. Specifically, in the former step, candidate projections are generated by using the nearest between-class boundary samples, while the latter is efficiently achieved by classifiability-based AdaBoost learning from the CPS. We show that the new "projection pursuit" paradigm not only does not suffer from the limitations of the traditional FLD but also inherits good generalizability from the boundary attribute of candidate projections. Extensive experiments on both synthetic and real datasets validate the effectiveness of CDPP for discriminative linear feature extraction.
{"title":"Classifiability-based discriminatory projection pursuit.","authors":"Yu Su, Shiguang Shan, Xilin Chen, Wen Gao","doi":"10.1109/TNN.2011.2170220","DOIUrl":"https://doi.org/10.1109/TNN.2011.2170220","url":null,"abstract":"<p><p>Fisher's linear discriminant (FLD) is one of the most widely used linear feature extraction method, especially in many visual computation tasks. Based on the analysis on several limitations of the traditional FLD, this paper attempts to propose a new computational paradigm for discriminative linear feature extraction, named \"classifiability-based discriminatory projection pursuit\" (CDPP), which is different from the traditional FLD and its variants. There are two steps in the proposed CDPP: one is the construction of a candidate projection set (CPS), and the other is the pursuit of discriminatory projections. Specifically, in the former step, candidate projections are generated by using the nearest between-class boundary samples, while the latter is efficiently achieved by classifiability-based AdaBoost learning from the CPS. We show that the new \"projection pursuit\" paradigm not only does not suffer from the limitations of the traditional FLD but also inherits good generalizability from the boundary attribute of candidate projections. Extensive experiments on both synthetic and real datasets validate the effectiveness of CDPP for discriminative linear feature extraction.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"2050-61"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2170220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30233684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-20DOI: 10.1109/TNN.2011.2171362
Soroush Javidi, Clive Cheong Took, Danilo P Mandic
An extension of the fast independent component analysis algorithm is proposed for the blind separation of both Q-proper and Q-improper quaternion-valued signals. This is achieved by maximizing a negentropy-based cost function, and is derived rigorously using the recently developed HR calculus in order to implement Newton optimization in the augmented quaternion statistics framework. It is shown that the use of augmented statistics and the associated widely linear modeling provides theoretical and practical advantages when dealing with general quaternion signals with noncircular (rotation-dependent) distributions. Simulations using both benchmark and real-world quaternion-valued signals support the approach.
{"title":"Fast independent component analysis algorithm for quaternion valued signals.","authors":"Soroush Javidi, Clive Cheong Took, Danilo P Mandic","doi":"10.1109/TNN.2011.2171362","DOIUrl":"https://doi.org/10.1109/TNN.2011.2171362","url":null,"abstract":"<p><p>An extension of the fast independent component analysis algorithm is proposed for the blind separation of both Q-proper and Q-improper quaternion-valued signals. This is achieved by maximizing a negentropy-based cost function, and is derived rigorously using the recently developed HR calculus in order to implement Newton optimization in the augmented quaternion statistics framework. It is shown that the use of augmented statistics and the associated widely linear modeling provides theoretical and practical advantages when dealing with general quaternion signals with noncircular (rotation-dependent) distributions. Simulations using both benchmark and real-world quaternion-valued signals support the approach.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"1967-78"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2171362","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30233686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-17DOI: 10.1109/TNN.2011.2170180
Rafael V Borges, Artur d'Avila Garcez, Luis C Lamb
The effective integration of knowledge representation, reasoning, and learning in a robust computational model is one of the key challenges of computer science and artificial intelligence. In particular, temporal knowledge and models have been fundamental in describing the behavior of computational systems. However, knowledge acquisition of correct descriptions of a system's desired behavior is a complex task. In this paper, we present a novel neural-computation model capable of representing and learning temporal knowledge in recurrent networks. The model works in an integrated fashion. It enables the effective representation of temporal knowledge, the adaptation of temporal models given a set of desirable system properties, and effective learning from examples, which in turn can lead to temporal knowledge extraction from the corresponding trained networks. The model is sound from a theoretical standpoint, but it has also been tested on a case study in the area of model verification and adaptation. The results contained in this paper indicate that model verification and learning can be integrated within the neural computation paradigm, contributing to the development of predictive temporal knowledge-based systems and offering interpretable results that allow system researchers and engineers to improve their models and specifications. The model has been implemented and is available as part of a neural-symbolic computational toolkit.
{"title":"Learning and representing temporal knowledge in recurrent networks.","authors":"Rafael V Borges, Artur d'Avila Garcez, Luis C Lamb","doi":"10.1109/TNN.2011.2170180","DOIUrl":"https://doi.org/10.1109/TNN.2011.2170180","url":null,"abstract":"<p><p>The effective integration of knowledge representation, reasoning, and learning in a robust computational model is one of the key challenges of computer science and artificial intelligence. In particular, temporal knowledge and models have been fundamental in describing the behavior of computational systems. However, knowledge acquisition of correct descriptions of a system's desired behavior is a complex task. In this paper, we present a novel neural-computation model capable of representing and learning temporal knowledge in recurrent networks. The model works in an integrated fashion. It enables the effective representation of temporal knowledge, the adaptation of temporal models given a set of desirable system properties, and effective learning from examples, which in turn can lead to temporal knowledge extraction from the corresponding trained networks. The model is sound from a theoretical standpoint, but it has also been tested on a case study in the area of model verification and adaptation. The results contained in this paper indicate that model verification and learning can be integrated within the neural computation paradigm, contributing to the development of predictive temporal knowledge-based systems and offering interpretable results that allow system researchers and engineers to improve their models and specifications. The model has been implemented and is available as part of a neural-symbolic computational toolkit.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"2409-21"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2170180","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30072532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-10DOI: 10.1109/TNN.2011.2167020
Fopefolu Folowosele, Tara Julia Hamilton, Ralph Etienne-Cummings
There are a number of spiking and bursting neuron models with varying levels of complexity, ranging from the simple integrate-and-fire model to the more complex Hodgkin-Huxley model. The simpler models tend to be easily implemented in silicon but yet not biologically plausible. Conversely, the more complex models tend to occupy a large area although they are more biologically plausible. In this paper, we present the 0.5 μm complementary metal-oxide-semiconductor (CMOS) implementation of the Mihalaş-Niebur neuron model--a generalized model of the leaky integrate-and-fire neuron with adaptive threshold--that is able to produce most of the known spiking and bursting patterns that have been observed in biology. Our implementation modifies the original proposed model, making it more amenable to CMOS implementation and more biologically plausible. All but one of the spiking properties--tonic spiking, class 1 spiking, phasic spiking, hyperpolarized spiking, rebound spiking, spike frequency adaptation, accommodation, threshold variability, integrator and input bistability--are demonstrated in this model.
{"title":"Silicon modeling of the Mihalaş-Niebur neuron.","authors":"Fopefolu Folowosele, Tara Julia Hamilton, Ralph Etienne-Cummings","doi":"10.1109/TNN.2011.2167020","DOIUrl":"https://doi.org/10.1109/TNN.2011.2167020","url":null,"abstract":"<p><p>There are a number of spiking and bursting neuron models with varying levels of complexity, ranging from the simple integrate-and-fire model to the more complex Hodgkin-Huxley model. The simpler models tend to be easily implemented in silicon but yet not biologically plausible. Conversely, the more complex models tend to occupy a large area although they are more biologically plausible. In this paper, we present the 0.5 μm complementary metal-oxide-semiconductor (CMOS) implementation of the Mihalaş-Niebur neuron model--a generalized model of the leaky integrate-and-fire neuron with adaptive threshold--that is able to produce most of the known spiking and bursting patterns that have been observed in biology. Our implementation modifies the original proposed model, making it more amenable to CMOS implementation and more biologically plausible. All but one of the spiking properties--tonic spiking, class 1 spiking, phasic spiking, hyperpolarized spiking, rebound spiking, spike frequency adaptation, accommodation, threshold variability, integrator and input bistability--are demonstrated in this model.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"1915-27"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2167020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30201401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01Epub Date: 2011-10-10DOI: 10.1109/TNN.2011.2168422
Xin Xu, Chunming Liu, Simon X Yang, Dewen Hu
In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e.g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.
{"title":"Hierarchical approximate policy iteration with binary-tree state space decomposition.","authors":"Xin Xu, Chunming Liu, Simon X Yang, Dewen Hu","doi":"10.1109/TNN.2011.2168422","DOIUrl":"https://doi.org/10.1109/TNN.2011.2168422","url":null,"abstract":"<p><p>In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e.g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.</p>","PeriodicalId":13434,"journal":{"name":"IEEE transactions on neural networks","volume":"22 12","pages":"1863-77"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TNN.2011.2168422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30201403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}