Para:在大数据分析中获取CPU时间片段

Q1 Computer Science IEEE Cloud Computing Pub Date : 2021-09-01 DOI:10.1109/CLOUD53861.2021.00081

Yuzhao Wang, Hongliang Qu, Junqing Yu, Zhibin Yu

{"title":"Para:在大数据分析中获取CPU时间片段","authors":"Yuzhao Wang, Hongliang Qu, Junqing Yu, Zhibin Yu","doi":"10.1109/CLOUD53861.2021.00081","DOIUrl":null,"url":null,"abstract":"Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"51 1","pages":"625-636"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Para: Harvesting CPU time fragments in Big Data Analytics\",\"authors\":\"Yuzhao Wang, Hongliang Qu, Junqing Yu, Zhibin Yu\",\"doi\":\"10.1109/CLOUD53861.2021.00081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.\",\"PeriodicalId\":54281,\"journal\":{\"name\":\"IEEE Cloud Computing\",\"volume\":\"51 1\",\"pages\":\"625-636\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUD53861.2021.00081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD53861.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

摘要

现代数据分析通常在静态预留的资源(如CPU和内存)上运行任务，为了保证服务质量(QoS)，容易出现过度配置，导致大量的资源时间碎片。因此，数据分析集群的资源利用率严重不足。在共享资源上的工作负载共址已经有了大量的研究，但是他们并不知道资源时间片段的大小，这使得他们很难在提高资源利用率的同时保证QoS。在本文中，我们提出了一种事件驱动的调度机制Para，以获取同址大数据分析工作负载中的CPU时间片段。Para创新了三种技术:1)通过捕获任务切换事件来识别与每个CPU核心相关的空闲CPU时间窗口(ICTW);2)设计工作负载各任务执行与底层资源管理系统之间的运行时通信机制;3)设计一个基于pull的调度器来调度一个工作负载在另一个工作负载的ICTW中运行。我们基于Apache Mesos和Spark实现Para。实验结果表明，在Spark的动态模式(MSDM)下，相对于原始Mesos和增强Mesos, Para的CPU利用率分别提高了44%和30%。此外，Para在保证主应用程序执行时间的前提下，使Mesos和MSDM的平均任务吞吐量分别提高了4.8倍和1.7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Para: Harvesting CPU time fragments in Big Data Analytics

Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Cloud Computing Computer Science-Computer Networks and Communications

CiteScore

11.20

自引率

0.00%

发文量

期刊介绍： Cessation. IEEE Cloud Computing is committed to the timely publication of peer-reviewed articles that provide innovative research ideas, applications results, and case studies in all areas of cloud computing. Topics relating to novel theory, algorithms, performance analyses and applications of techniques are covered. More specifically: Cloud software, Cloud security, Trade-offs between privacy and utility of cloud, Cloud in the business environment, Cloud economics, Cloud governance, Migrating to the cloud, Cloud standards, Development tools, Backup and recovery, Interoperability, Applications management, Data analytics, Communications protocols, Mobile cloud, Private clouds, Liability issues for data loss on clouds, Data integration, Big data, Cloud education, Cloud skill sets, Cloud energy consumption, The architecture of cloud computing, Applications in commerce, education, and industry, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Business Process as a Service (BPaaS)