{"title":"Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion","authors":"Achref Jaziri, Etienne Künzel, Visvanathan Ramesh","doi":"arxiv-2408.09838","DOIUrl":null,"url":null,"abstract":"A continual learning agent builds on previous experiences to develop\nincreasingly complex behaviors by adapting to non-stationary and dynamic\nenvironments while preserving previously acquired knowledge. However, scaling\nthese systems presents significant challenges, particularly in balancing the\npreservation of previous policies with the adaptation of new ones to current\nenvironments. This balance, known as the stability-plasticity dilemma, is\nespecially pronounced in complex multi-agent domains such as the train\nscheduling problem, where environmental and agent behaviors are constantly\nchanging, and the search space is vast. In this work, we propose addressing\nthese challenges in the train scheduling problem using curriculum learning. We\ndesign a curriculum with adjacent skills that build on each other to improve\ngeneralization performance. Introducing a curriculum with distinct tasks\nintroduces non-stationarity, which we address by proposing a new algorithm:\nContinual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically\ngenerates and adjusts Q-function subspaces to handle environmental changes and\ntask requirements. CDE mitigates catastrophic forgetting through EWC while\nensuring high plasticity using adaptive rational activation functions.\nExperimental results demonstrate significant improvements in learning\nefficiency and adaptability compared to RL baselines and other adapted methods\nfor continual learning, highlighting the potential of our method in managing\nthe stability-plasticity dilemma in the adaptive train scheduling setting.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"113 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A continual learning agent builds on previous experiences to develop
increasingly complex behaviors by adapting to non-stationary and dynamic
environments while preserving previously acquired knowledge. However, scaling
these systems presents significant challenges, particularly in balancing the
preservation of previous policies with the adaptation of new ones to current
environments. This balance, known as the stability-plasticity dilemma, is
especially pronounced in complex multi-agent domains such as the train
scheduling problem, where environmental and agent behaviors are constantly
changing, and the search space is vast. In this work, we propose addressing
these challenges in the train scheduling problem using curriculum learning. We
design a curriculum with adjacent skills that build on each other to improve
generalization performance. Introducing a curriculum with distinct tasks
introduces non-stationarity, which we address by proposing a new algorithm:
Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically
generates and adjusts Q-function subspaces to handle environmental changes and
task requirements. CDE mitigates catastrophic forgetting through EWC while
ensuring high plasticity using adaptive rational activation functions.
Experimental results demonstrate significant improvements in learning
efficiency and adaptability compared to RL baselines and other adapted methods
for continual learning, highlighting the potential of our method in managing
the stability-plasticity dilemma in the adaptive train scheduling setting.