{"title":"Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features","authors":"Jiuqi Wang, Shangtong Zhang","doi":"arxiv-2409.12135","DOIUrl":null,"url":null,"abstract":"Temporal difference (TD) learning with linear function approximation,\nabbreviated as linear TD, is a classic and powerful prediction algorithm in\nreinforcement learning. While it is well understood that linear TD converges\nalmost surely to a unique point, this convergence traditionally requires the\nassumption that the features used by the approximator are linearly independent.\nHowever, this linear independence assumption does not hold in many practical\nscenarios. This work is the first to establish the almost sure convergence of\nlinear TD without requiring linearly independent features. In fact, we do not\nmake any assumptions on the features. We prove that the approximated value\nfunction converges to a unique point and the weight iterates converge to a set.\nWe also establish a notion of local stability of the weight iterates.\nImportantly, we do not need to introduce any other additional assumptions and\ndo not need to make any modification to the linear TD algorithm. Key to our\nanalysis is a novel characterization of bounded invariant sets of the mean ODE\nof linear TD.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"205 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Temporal difference (TD) learning with linear function approximation,
abbreviated as linear TD, is a classic and powerful prediction algorithm in
reinforcement learning. While it is well understood that linear TD converges
almost surely to a unique point, this convergence traditionally requires the
assumption that the features used by the approximator are linearly independent.
However, this linear independence assumption does not hold in many practical
scenarios. This work is the first to establish the almost sure convergence of
linear TD without requiring linearly independent features. In fact, we do not
make any assumptions on the features. We prove that the approximated value
function converges to a unique point and the weight iterates converge to a set.
We also establish a notion of local stability of the weight iterates.
Importantly, we do not need to introduce any other additional assumptions and
do not need to make any modification to the linear TD algorithm. Key to our
analysis is a novel characterization of bounded invariant sets of the mean ODE
of linear TD.