Deep reinforcement learning-based scheduling in distributed systems: a critical review

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge and Information Systems Pub Date : 2024-06-26 DOI:10.1007/s10115-024-02167-7

Zahra Jalali Khalil Abadi, Najme Mansouri, Mohammad Masoud Javidi

{"title":"Deep reinforcement learning-based scheduling in distributed systems: a critical review","authors":"Zahra Jalali Khalil Abadi, Najme Mansouri, Mohammad Masoud Javidi","doi":"10.1007/s10115-024-02167-7","DOIUrl":null,"url":null,"abstract":"<p>Many fields of research use parallelized and distributed computing environments, including astronomy, earth science, and bioinformatics. Due to an increase in client requests, service providers face various challenges, such as task scheduling, security, resource management, and virtual machine migration. NP-hard scheduling problems require a long time to implement an optimal or suboptimal solution due to their large solution space. With recent advances in artificial intelligence, deep reinforcement learning (DRL) can be used to solve scheduling problems. The DRL approach combines the strength of deep learning and neural networks with reinforcement learning’s feedback-based learning. This paper provides a comprehensive overview of DRL-based scheduling algorithms in distributed systems by categorizing algorithms and applications. As a result, several articles are assessed based on their main objectives, quality of service and scheduling parameters, as well as evaluation environments (i.e., simulation tools, real-world environment). The literature review indicates that algorithms based on RL, such as Q-learning, are effective for learning scaling and scheduling policies in a cloud environment. Additionally, the challenges and directions for further research on deep reinforcement learning to address scheduling problems were summarized (e.g., edge intelligence, ideal dynamic task scheduling framework, human–machine interaction, resource-hungry artificial intelligence (AI) and sustainability).</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"39 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-024-02167-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Many fields of research use parallelized and distributed computing environments, including astronomy, earth science, and bioinformatics. Due to an increase in client requests, service providers face various challenges, such as task scheduling, security, resource management, and virtual machine migration. NP-hard scheduling problems require a long time to implement an optimal or suboptimal solution due to their large solution space. With recent advances in artificial intelligence, deep reinforcement learning (DRL) can be used to solve scheduling problems. The DRL approach combines the strength of deep learning and neural networks with reinforcement learning’s feedback-based learning. This paper provides a comprehensive overview of DRL-based scheduling algorithms in distributed systems by categorizing algorithms and applications. As a result, several articles are assessed based on their main objectives, quality of service and scheduling parameters, as well as evaluation environments (i.e., simulation tools, real-world environment). The literature review indicates that algorithms based on RL, such as Q-learning, are effective for learning scaling and scheduling policies in a cloud environment. Additionally, the challenges and directions for further research on deep reinforcement learning to address scheduling problems were summarized (e.g., edge intelligence, ideal dynamic task scheduling framework, human–machine interaction, resource-hungry artificial intelligence (AI) and sustainability).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度强化学习的分布式系统调度：重要综述

许多研究领域都使用并行化和分布式计算环境，包括天文学、地球科学和生物信息学。由于客户请求的增加，服务提供商面临着任务调度、安全性、资源管理和虚拟机迁移等各种挑战。由于 NP 难调度问题的求解空间很大，因此需要很长时间才能找到最优或次优解。随着人工智能领域的最新进展，深度强化学习（DRL）可用于解决调度问题。DRL 方法将深度学习和神经网络的优势与强化学习的反馈学习相结合。本文通过对算法和应用进行分类，全面概述了分布式系统中基于 DRL 的调度算法。因此，本文根据其主要目标、服务质量和调度参数以及评估环境（即仿真工具、真实世界环境）对多篇文章进行了评估。文献综述表明，基于 RL 的算法（如 Q-learning）可有效学习云环境中的扩展和调度策略。此外，还总结了深度强化学习在解决调度问题方面面临的挑战和进一步研究的方向（如边缘智能、理想的动态任务调度框架、人机交互、资源饥渴型人工智能（AI）和可持续性）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Knowledge and Information Systems 工程技术-计算机：人工智能

CiteScore

5.70

自引率

7.40%

发文量

152

审稿时长

7.2 months

期刊介绍： Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.