Smoothing on Dynamic Concurrency Throttling

Janaina Schwarzrock, Hiago Rocha, A. Lorenzon, A. C. S. Beck
{"title":"Smoothing on Dynamic Concurrency Throttling","authors":"Janaina Schwarzrock, Hiago Rocha, A. Lorenzon, A. C. S. Beck","doi":"10.1109/IPDPSW55747.2022.00154","DOIUrl":null,"url":null,"abstract":"Technology scaling has been allowing a growing number of cores in processors to satisfy the increasing demand of new applications, which need to process huge amounts of data in High-Performance Computing (HPC). However, considering that many parallel applications have limited scalability, not always activating the maximum number of available cores to execute an application will provide the best outcome in energy and performance (represented by the Energy-Delay Product, or EDP). Because of that, many works have already proposed different Dynamic Concurrency Throttling (DCT) techniques to adapt thread count as the application executes. However, dynamically tuning (i.e. increasing or decreasing) thread count implies in overheads related to caches warm-up and thread reallocation across the cores. This overhead may become very significant when thread count changes often during execution, which may overcome the benefits brought by DCT. This problem is further aggravated in Non-Uniform Memory Access (NUMA) systems since some cores are more distant, in terms of latency, than others. With that in mind, in this paper, we propose a smoothing-based strategy to minimize the thread count changes and, consequently, mitigate the aforementioned overhead. Our proposal is generic and aims further to improve the optimization results of any DCT technique. As case-study, we performed experiments on two multicore systems with nine well-known benchmarks, showing that our smoothing technique improves EDP results of offline and online state-of-the-art DCT techniques by up to 93% and 89% (both 22% on mean), respectively.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Technology scaling has been allowing a growing number of cores in processors to satisfy the increasing demand of new applications, which need to process huge amounts of data in High-Performance Computing (HPC). However, considering that many parallel applications have limited scalability, not always activating the maximum number of available cores to execute an application will provide the best outcome in energy and performance (represented by the Energy-Delay Product, or EDP). Because of that, many works have already proposed different Dynamic Concurrency Throttling (DCT) techniques to adapt thread count as the application executes. However, dynamically tuning (i.e. increasing or decreasing) thread count implies in overheads related to caches warm-up and thread reallocation across the cores. This overhead may become very significant when thread count changes often during execution, which may overcome the benefits brought by DCT. This problem is further aggravated in Non-Uniform Memory Access (NUMA) systems since some cores are more distant, in terms of latency, than others. With that in mind, in this paper, we propose a smoothing-based strategy to minimize the thread count changes and, consequently, mitigate the aforementioned overhead. Our proposal is generic and aims further to improve the optimization results of any DCT technique. As case-study, we performed experiments on two multicore systems with nine well-known benchmarks, showing that our smoothing technique improves EDP results of offline and online state-of-the-art DCT techniques by up to 93% and 89% (both 22% on mean), respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
动态并发节流平滑
技术扩展已经允许处理器中越来越多的内核来满足新应用程序不断增长的需求,这些应用程序需要在高性能计算(HPC)中处理大量数据。然而,考虑到许多并行应用程序具有有限的可伸缩性,不总是激活最大数量的可用内核来执行应用程序将在能源和性能方面提供最佳结果(由能源延迟产品表示,或EDP)。正因为如此,许多工作已经提出了不同的动态并发节流(DCT)技术,以便在应用程序执行时调整线程数。然而,动态调优(即增加或减少)线程数意味着与缓存预热和跨内核的线程重新分配相关的开销。当线程计数在执行期间经常变化时,这种开销可能会变得非常大,这可能会抵消DCT带来的好处。这个问题在非统一内存访问(NUMA)系统中进一步恶化,因为就延迟而言,一些内核比其他内核距离更远。考虑到这一点,在本文中,我们提出了一种基于平滑的策略,以最大限度地减少线程数变化,从而减轻上述开销。我们的建议是通用的,旨在进一步改善任何DCT技术的优化结果。作为案例研究,我们在两个多核系统上进行了9个众所周知的基准测试,结果表明,我们的平滑技术将离线和在线最先进的DCT技术的EDP结果分别提高了93%和89%(平均为22%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
(CGRA4HPC) 2022 Invited Speaker: Pushing the Boundaries of HPC with the Integration of AI Moving from Composable to Programmable Energy-aware neural architecture selection and hyperparameter optimization Smoothing on Dynamic Concurrency Throttling An Analysis of Mapping Polybench Kernels to HPC CGRAs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1