Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.
{"title":"Toward General Function Approximation in Nonstationary Reinforcement Learning","authors":"Songtao Feng;Ming Yin;Ruiquan Huang;Yu-Xiang Wang;Jing Yang;Yingbin Liang","doi":"10.1109/JSAIT.2024.3381818","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381818","url":null,"abstract":"Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"190-206"},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27DOI: 10.1109/JSAIT.2024.3381195
Hyun-Young Park;Seung-Hyun Nam;Si-Hyeon Lee
In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower communication costs. Indeed, we find many new LDP schemes that achieve the exactly optimal privacy-utility trade-off, with the minimum communication cost among all the unbiased or consistent schemes, for a certain set of input data size and LDP constraint. Furthermore, to partially solve the sparse existence issue of block design schemes, we consider a broader class of LDP schemes based on regular and pairwise-balanced designs, called RPBD schemes, which relax one of the symmetry requirements on block designs. By considering this broader class of RPBD schemes, we can find LDP schemes achieving near-optimal privacy-utility trade-off with reasonably low communication costs for a much larger set of input data size and LDP constraint.
{"title":"Exactly Optimal and Communication-Efficient Private Estimation via Block Designs","authors":"Hyun-Young Park;Seung-Hyun Nam;Si-Hyeon Lee","doi":"10.1109/JSAIT.2024.3381195","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381195","url":null,"abstract":"In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower communication costs. Indeed, we find many new LDP schemes that achieve the exactly optimal privacy-utility trade-off, with the minimum communication cost among all the unbiased or consistent schemes, for a certain set of input data size and LDP constraint. Furthermore, to partially solve the sparse existence issue of block design schemes, we consider a broader class of LDP schemes based on regular and pairwise-balanced designs, called RPBD schemes, which relax one of the symmetry requirements on block designs. By considering this broader class of RPBD schemes, we can find LDP schemes achieving near-optimal privacy-utility trade-off with reasonably low communication costs for a much larger set of input data size and LDP constraint.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"123-134"},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1109/JSAIT.2024.3381869
Alireza Sadeghi;Gang Wang;Georgios B. Giannakis
Successful training of data-intensive deep neural networks critically rely on vast, clean, and high-quality datasets. In practice however, their reliability diminishes, particularly with noisy, outlier-corrupted data samples encountered in testing. This challenge intensifies when dealing with anonymized, heterogeneous data sets stored across geographically distinct locations due to, e.g., privacy concerns. This present paper introduces robust learning frameworks tailored for centralized and federated learning scenarios. Our goal is to fortify model resilience with a focus that lies in (i) addressing distribution shifts from training to inference time; and, (ii) ensuring test-time robustness, when a trained model may encounter outliers or adversarially contaminated test data samples. To this aim, we start with a centralized setting where the true data distribution is considered unknown, but residing within a Wasserstein ball centered at the empirical distribution. We obtain robust models by minimizing the worst-case expected loss within this ball, yielding an intractable infinite-dimensional optimization problem. Upon leverage the strong duality condition, we arrive at a tractable surrogate learning problem. We develop two stochastic primal-dual algorithms to solve the resultant problem: one for $epsilon $