Analysis and approximation of optimal co-scheduling on cmp

Xipeng Shen, Yunlian Jiang
{"title":"Analysis and approximation of optimal co-scheduling on cmp","authors":"Xipeng Shen, Yunlian Jiang","doi":"10.21220/S2-TJMJ-8K82","DOIUrl":null,"url":null,"abstract":"In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling. The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems. The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling. Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly.","PeriodicalId":100647,"journal":{"name":"IEEE Translation Journal on Magnetics in Japan","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Translation Journal on Magnetics in Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21220/S2-TJMJ-8K82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In recent years, the increasing design complexity and the problems of power and heat dissipation have caused a shift in processor technology to favor Chip Multiprocessors. In Chip Multiprocessors (CMP) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. This dissertation aims to tackle two of the most prominent challenges in job co-scheduling. The first challenge is in the computational complexity for determining optimal job co-schedules. This dissertation presents one of the first systematic analyses on the complexity of job co-scheduling. Besides proving the NP completeness of job co-scheduling, it introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal schedules effectively by proposing several heuristics-based algorithms. These discoveries facilitate the assessment of job co-schedulers by providing necessary baselines, and shed insights to the development of practical co-scheduling systems. The second challenge resides in the prediction of the performance of processes co-running on a shared cache. This dissertation explores the influence on co-run performance prediction imposed by co-runners, program inputs, and cache configurations. Through a sequence of formal analysis, we derive an analytical co-run locality model, uncovering the inherent statistical connections between the data references of programs single-runs and their co-run locality. The model offers theoretical insights on co-run locality analysis and leads to a lightweight approach for fast prediction of shared cache performance. We demonstrate the effectiveness of the model in enabling proactive job co-scheduling. Together, the two-dimensional findings open up many new opportunities for cache management on modern CMP by laying the foundation for job co-scheduling, and enhancing the understanding to data locality and cache sharing significantly.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
cmp上最优协同调度的分析与逼近
近年来,日益增加的设计复杂性以及功耗和散热问题导致处理器技术转向支持芯片多处理器。在芯片多处理器(CMP)体系结构中,多个内核共享一些片内缓存是很常见的。共享可能会导致缓存抖动和共同运行的作业之间的争用。作业协同调度是一种解决问题的方法,它将作业适当地分配给核心,从而最大限度地减少争用和随之而来的性能下降。本文旨在解决作业协同调度中两个最突出的挑战。第一个挑战是确定最优作业协同调度的计算复杂性。本文首次系统地分析了作业协同调度的复杂性。除了证明作业协同调度的NP完备性外,还介绍了一组基于图论和整数/线性规划的算法,用于计算有或没有作业迁移情况下的最优协同调度及其下界。在复杂情况下,通过提出几种启发式算法,实证地论证了有效逼近最优调度的可行性。这些发现通过提供必要的基线,促进了作业协同调度程序的评估,并为实际协同调度系统的开发提供了见解。第二个挑战在于预测在共享缓存上共同运行的进程的性能。本文探讨了共同运行程序、程序输入和缓存配置对共同运行性能预测的影响。通过一系列形式化分析,我们导出了一个解析的共运行局部性模型,揭示了程序单次运行的数据引用与其共运行局部性之间的内在统计联系。该模型为协同运行局部性分析提供了理论见解,并为共享缓存性能的快速预测提供了一种轻量级方法。我们证明了该模型在实现主动作业协同调度方面的有效性。通过为作业协同调度奠定基础,并显著增强对数据局部性和缓存共享的理解,为现代CMP上的缓存管理开辟了许多新的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analysis and approximation of optimal co-scheduling on cmp [Front cover] Table of contents Domain Wall Motion and Eddy Current Losses in Very Thin 3% Si-Fe Core List of contributors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1