Analysis and approximation of optimal co-scheduling on cmp

IEEE Translation Journal on Magnetics in Japan Pub Date : 2011-01-01 DOI:10.21220/S2-TJMJ-8K82

Xipeng Shen, Yunlian Jiang

{"title":"Analysis and approximation of optimal co-scheduling on cmp","authors":"Xipeng Shen, Yunlian Jiang","doi":"10.21220/S2-TJMJ-8K82","DOIUrl":null,"url":null,"abstract":"In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling. The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems. The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling. Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly.","PeriodicalId":100647,"journal":{"name":"IEEE Translation Journal on Magnetics in Japan","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Translation Journal on Magnetics in Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21220/S2-TJMJ-8K82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling. The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems. The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling. Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

cmp上最优协同调度的分析与逼近

近年来，日益增加的设计复杂性以及功耗和散热问题导致处理器技术转向支持芯片多处理器。在芯片多处理器(CMP)体系结构中，多个内核共享一些片内缓存是很常见的。共享可能会导致缓存抖动和共同运行的作业之间的争用。作业协同调度是一种解决问题的方法，它将作业适当地分配给核心，从而最大限度地减少争用和随之而来的性能下降。本文旨在解决作业协同调度中两个最突出的挑战。第一个挑战是确定最优作业协同调度的计算复杂性。本文首次系统地分析了作业协同调度的复杂性。除了证明作业协同调度的NP完备性外，还介绍了一组基于图论和整数/线性规划的算法，用于计算有或没有作业迁移情况下的最优协同调度及其下界。在复杂情况下，通过提出几种启发式算法，实证地论证了有效逼近最优调度的可行性。这些发现通过提供必要的基线，促进了作业协同调度程序的评估，并为实际协同调度系统的开发提供了见解。第二个挑战在于预测在共享缓存上共同运行的进程的性能。本文探讨了共同运行程序、程序输入和缓存配置对共同运行性能预测的影响。通过一系列形式化分析，我们导出了一个解析的共运行局部性模型，揭示了程序单次运行的数据引用与其共运行局部性之间的内在统计联系。该模型为协同运行局部性分析提供了理论见解，并为共享缓存性能的快速预测提供了一种轻量级方法。我们证明了该模型在实现主动作业协同调度方面的有效性。通过为作业协同调度奠定基础，并显著增强对数据局部性和缓存共享的理解，为现代CMP上的缓存管理开辟了许多新的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Translation Journal on Magnetics in Japan

自引率

0.00%

发文量

期刊最新文献

Analysis and approximation of optimal co-scheduling on cmp [Front cover] Table of contents Domain Wall Motion and Eddy Current Losses in Very Thin 3% Si-Fe Core List of contributors