{"title":"Para:在大数据分析中获取CPU时间片段","authors":"Yuzhao Wang, Hongliang Qu, Junqing Yu, Zhibin Yu","doi":"10.1109/CLOUD53861.2021.00081","DOIUrl":null,"url":null,"abstract":"Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"51 1","pages":"625-636"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Para: Harvesting CPU time fragments in Big Data Analytics\",\"authors\":\"Yuzhao Wang, Hongliang Qu, Junqing Yu, Zhibin Yu\",\"doi\":\"10.1109/CLOUD53861.2021.00081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.\",\"PeriodicalId\":54281,\"journal\":{\"name\":\"IEEE Cloud Computing\",\"volume\":\"51 1\",\"pages\":\"625-636\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUD53861.2021.00081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD53861.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
Para: Harvesting CPU time fragments in Big Data Analytics
Modern data analytics typically run tasks on statically reserved resources (e.g., CPU and memory), which is prone to over-provision to guarantee the Quality of Service (QoS), leading to a large amount of resource time fragments. As a result, the resource utilization of a data analytics cluster is severely under-utilized. Workload co-location on shared resources has been substantially studied, but they are unaware the sizes of resource time fragments, making them hard to improve the resource utilization and guarantee QoS at the same time. In this paper, we propose Para, an event-driven scheduling mechanism, to harvest the CPU time fragments in co-located big data analytic workloads. Para innovates three techniques: 1) identifying the Idle CPU Time Window (ICTW) associated with each CPU core by capturing the task-switch event; 2) designing a runtime communication mechanism between each task execution of a workload and the underlying resource management system; 3) designing a pull-based scheduler to schedule a workload to run in the ICTW of another workload. We implement Para based on Apache Mesos and Spark. And the experimental results show that Para improves the CPU utilization by 44% and 30% on average relative to the original Mesos and enhanced Mesos under Spark's dynamic mode (MSDM), respectively. Moreover, Para increases the averaged task throughput of Mesos and MSDM by 4.8x and 1.7x, respectively, while guaranteeing the execution time of the primary applications.
期刊介绍:
Cessation.
IEEE Cloud Computing is committed to the timely publication of peer-reviewed articles that provide innovative research ideas, applications results, and case studies in all areas of cloud computing. Topics relating to novel theory, algorithms, performance analyses and applications of techniques are covered. More specifically: Cloud software, Cloud security, Trade-offs between privacy and utility of cloud, Cloud in the business environment, Cloud economics, Cloud governance, Migrating to the cloud, Cloud standards, Development tools, Backup and recovery, Interoperability, Applications management, Data analytics, Communications protocols, Mobile cloud, Private clouds, Liability issues for data loss on clouds, Data integration, Big data, Cloud education, Cloud skill sets, Cloud energy consumption, The architecture of cloud computing, Applications in commerce, education, and industry, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Business Process as a Service (BPaaS)