Curriculum learning in reinforcement learning utilizes a strategy that sequences simpler tasks in order to optimize the learning process for more complex problems. Typically, existing methods are categorized into two distinct approaches: one that develops a teacher (a curriculum strategy) policy concurrently with a student (a learning agent) policy, and another that utilizes selective sampling based on the student policy’s experiences across a task distribution. The main issue with the first approach is the substantial computational demand, as it requires simultaneous training of both the low-level (student) and high-level (teacher) reinforcement learning policies. On the other hand, methods based on selective sampling presuppose that the agent is capable of maximizing reward accumulation across all tasks, which may lead to complications when the primary mission is to master a specific target task. This makes those models less effective in scenarios requiring focused learning. Our research addresses a particular scenario where a teacher needs to train a new student in a new short episode. This constraint compels the teacher to rapidly master the curriculum planning by identifying the most appropriate tasks. We evaluated our framework across several complex scenarios, including a partially observable grid-world navigation environment, and in procedurally generated open-world environment Crafter.