{"title":"云环境下基于强化学习的Spark作业调度","authors":"Vishnu Prasad Verma, Nenavath Srinivas Naik, Santosh Kumar","doi":"10.1109/UPCON56432.2022.9986440","DOIUrl":null,"url":null,"abstract":"Recently, big data computing paradigm has been gaining proliferation due to wide applications for processing enormous volumes of data to produce meaningful information. The big data computing frameworks perform data processing in cloud computing or physical on-premises. Cloud service providers provide flexible, affordable, and reliable resources that are easier to manage than on-premise physical data centers. So many organization are now moving their big data computing framework over to the cloud computing environment. However, due to several limitations, including the need to reduce costs for using virtual machines, optimize system performance by lowering the Average job completion time, and adhere to service level agreements for the jobs, scheduling Spark jobs efficiently in a cloud environment is a challenging problem. Numerous heuristic-based solutions are available in the literature; however, they do not work well in heterogeneous cloud environments where many constraints are present while scheduling the jobs. So, in this paper, we have optimized the use of computing resources in a cloud environment by analyzing spark job scheduling based on reinforcement learning algorithms. The case study's proposed analysis demonstrates how a reinforcement learning algorithm enables an agent to learn the inherent properties of the computing environment for job scheduling.","PeriodicalId":185782,"journal":{"name":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning based Scheduling for Spark Jobs in Cloud Environment\",\"authors\":\"Vishnu Prasad Verma, Nenavath Srinivas Naik, Santosh Kumar\",\"doi\":\"10.1109/UPCON56432.2022.9986440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, big data computing paradigm has been gaining proliferation due to wide applications for processing enormous volumes of data to produce meaningful information. The big data computing frameworks perform data processing in cloud computing or physical on-premises. Cloud service providers provide flexible, affordable, and reliable resources that are easier to manage than on-premise physical data centers. So many organization are now moving their big data computing framework over to the cloud computing environment. However, due to several limitations, including the need to reduce costs for using virtual machines, optimize system performance by lowering the Average job completion time, and adhere to service level agreements for the jobs, scheduling Spark jobs efficiently in a cloud environment is a challenging problem. Numerous heuristic-based solutions are available in the literature; however, they do not work well in heterogeneous cloud environments where many constraints are present while scheduling the jobs. So, in this paper, we have optimized the use of computing resources in a cloud environment by analyzing spark job scheduling based on reinforcement learning algorithms. The case study's proposed analysis demonstrates how a reinforcement learning algorithm enables an agent to learn the inherent properties of the computing environment for job scheduling.\",\"PeriodicalId\":185782,\"journal\":{\"name\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"volume\":\"149 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON56432.2022.9986440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON56432.2022.9986440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning based Scheduling for Spark Jobs in Cloud Environment
Recently, big data computing paradigm has been gaining proliferation due to wide applications for processing enormous volumes of data to produce meaningful information. The big data computing frameworks perform data processing in cloud computing or physical on-premises. Cloud service providers provide flexible, affordable, and reliable resources that are easier to manage than on-premise physical data centers. So many organization are now moving their big data computing framework over to the cloud computing environment. However, due to several limitations, including the need to reduce costs for using virtual machines, optimize system performance by lowering the Average job completion time, and adhere to service level agreements for the jobs, scheduling Spark jobs efficiently in a cloud environment is a challenging problem. Numerous heuristic-based solutions are available in the literature; however, they do not work well in heterogeneous cloud environments where many constraints are present while scheduling the jobs. So, in this paper, we have optimized the use of computing resources in a cloud environment by analyzing spark job scheduling based on reinforcement learning algorithms. The case study's proposed analysis demonstrates how a reinforcement learning algorithm enables an agent to learn the inherent properties of the computing environment for job scheduling.