基于强化学习的物联网系统多目标资源调度

IF 1.6 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Low Power Electronics and Applications Pub Date : 2022-10-08 DOI:10.3390/jlpea12040053

Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura

{"title":"基于强化学习的物联网系统多目标资源调度","authors":"Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura","doi":"10.3390/jlpea12040053","DOIUrl":null,"url":null,"abstract":"IoT embedded systems have multiple objectives that need to be maximized simultaneously. These objectives conflict with each other due to limited resources and tradeoffs that need to be made. This requires multi-objective optimization (MOO) and multiple Pareto-optimal solutions are possible. In such a case, tradeoffs are made w.r.t. a user-defined preference. This work presents a general Multi-objective Reinforcement Learning (MORL) framework for MOO of IoT embedded systems. This framework comprises a general Multi-objective Markov Decision Process (MOMDP) formulation and two novel low-compute MORL algorithms. The algorithms learn policies to tradeoff between multiple objectives using a single preference parameter. We take the energy scheduling problem in general Energy Harvesting Wireless Sensor Nodes (EHWSNs) as a case example in which a sensor node is required to maximize its sensing rate, and transmission performance as well as ensure long-term uninterrupted operation within a very tight energy budget. We simulate single-task and dual-task EHWSN systems to evaluate our framework.. The results demonstrate that our MORL algorithms can learn better policies at lower learning costs and successfully tradeoff between multiple objectives at runtime.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2022-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning\",\"authors\":\"Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura\",\"doi\":\"10.3390/jlpea12040053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IoT embedded systems have multiple objectives that need to be maximized simultaneously. These objectives conflict with each other due to limited resources and tradeoffs that need to be made. This requires multi-objective optimization (MOO) and multiple Pareto-optimal solutions are possible. In such a case, tradeoffs are made w.r.t. a user-defined preference. This work presents a general Multi-objective Reinforcement Learning (MORL) framework for MOO of IoT embedded systems. This framework comprises a general Multi-objective Markov Decision Process (MOMDP) formulation and two novel low-compute MORL algorithms. The algorithms learn policies to tradeoff between multiple objectives using a single preference parameter. We take the energy scheduling problem in general Energy Harvesting Wireless Sensor Nodes (EHWSNs) as a case example in which a sensor node is required to maximize its sensing rate, and transmission performance as well as ensure long-term uninterrupted operation within a very tight energy budget. We simulate single-task and dual-task EHWSN systems to evaluate our framework.. The results demonstrate that our MORL algorithms can learn better policies at lower learning costs and successfully tradeoff between multiple objectives at runtime.\",\"PeriodicalId\":38100,\"journal\":{\"name\":\"Journal of Low Power Electronics and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Low Power Electronics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/jlpea12040053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea12040053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

摘要

物联网嵌入式系统有多个目标，需要同时最大化。由于资源有限和需要进行权衡，这些目标相互冲突。这需要多目标优化（MOO），并且多个Pareto最优解是可能的。在这种情况下，会根据用户定义的偏好进行权衡。本文提出了一个用于物联网嵌入式系统MOO的通用多目标强化学习（MORL）框架。该框架包括一个通用的多目标马尔可夫决策过程（MOMDP）公式和两个新的低计算量MORL算法。算法学习使用单个偏好参数在多个目标之间进行权衡的策略。我们以一般能量采集无线传感器节点（EHWSN）中的能量调度问题为例，其中传感器节点需要最大限度地提高其感测速率和传输性能，并确保在非常紧张的能量预算内长期不间断地运行。我们模拟了单任务和双任务EHWSN系统来评估我们的框架。。结果表明，我们的MORL算法可以以较低的学习成本学习更好的策略，并在运行时成功地在多个目标之间进行权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning

IoT embedded systems have multiple objectives that need to be maximized simultaneously. These objectives conflict with each other due to limited resources and tradeoffs that need to be made. This requires multi-objective optimization (MOO) and multiple Pareto-optimal solutions are possible. In such a case, tradeoffs are made w.r.t. a user-defined preference. This work presents a general Multi-objective Reinforcement Learning (MORL) framework for MOO of IoT embedded systems. This framework comprises a general Multi-objective Markov Decision Process (MOMDP) formulation and two novel low-compute MORL algorithms. The algorithms learn policies to tradeoff between multiple objectives using a single preference parameter. We take the energy scheduling problem in general Energy Harvesting Wireless Sensor Nodes (EHWSNs) as a case example in which a sensor node is required to maximize its sensing rate, and transmission performance as well as ensure long-term uninterrupted operation within a very tight energy budget. We simulate single-task and dual-task EHWSN systems to evaluate our framework.. The results demonstrate that our MORL algorithms can learn better policies at lower learning costs and successfully tradeoff between multiple objectives at runtime.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊