Janaina Schwarzrock, Hiago Rocha, A. Lorenzon, A. C. S. Beck
{"title":"Smoothing on Dynamic Concurrency Throttling","authors":"Janaina Schwarzrock, Hiago Rocha, A. Lorenzon, A. C. S. Beck","doi":"10.1109/IPDPSW55747.2022.00154","DOIUrl":null,"url":null,"abstract":"Technology scaling has been allowing a growing number of cores in processors to satisfy the increasing demand of new applications, which need to process huge amounts of data in High-Performance Computing (HPC). However, considering that many parallel applications have limited scalability, not always activating the maximum number of available cores to execute an application will provide the best outcome in energy and performance (represented by the Energy-Delay Product, or EDP). Because of that, many works have already proposed different Dynamic Concurrency Throttling (DCT) techniques to adapt thread count as the application executes. However, dynamically tuning (i.e. increasing or decreasing) thread count implies in overheads related to caches warm-up and thread reallocation across the cores. This overhead may become very significant when thread count changes often during execution, which may overcome the benefits brought by DCT. This problem is further aggravated in Non-Uniform Memory Access (NUMA) systems since some cores are more distant, in terms of latency, than others. With that in mind, in this paper, we propose a smoothing-based strategy to minimize the thread count changes and, consequently, mitigate the aforementioned overhead. Our proposal is generic and aims further to improve the optimization results of any DCT technique. As case-study, we performed experiments on two multicore systems with nine well-known benchmarks, showing that our smoothing technique improves EDP results of offline and online state-of-the-art DCT techniques by up to 93% and 89% (both 22% on mean), respectively.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Technology scaling has been allowing a growing number of cores in processors to satisfy the increasing demand of new applications, which need to process huge amounts of data in High-Performance Computing (HPC). However, considering that many parallel applications have limited scalability, not always activating the maximum number of available cores to execute an application will provide the best outcome in energy and performance (represented by the Energy-Delay Product, or EDP). Because of that, many works have already proposed different Dynamic Concurrency Throttling (DCT) techniques to adapt thread count as the application executes. However, dynamically tuning (i.e. increasing or decreasing) thread count implies in overheads related to caches warm-up and thread reallocation across the cores. This overhead may become very significant when thread count changes often during execution, which may overcome the benefits brought by DCT. This problem is further aggravated in Non-Uniform Memory Access (NUMA) systems since some cores are more distant, in terms of latency, than others. With that in mind, in this paper, we propose a smoothing-based strategy to minimize the thread count changes and, consequently, mitigate the aforementioned overhead. Our proposal is generic and aims further to improve the optimization results of any DCT technique. As case-study, we performed experiments on two multicore systems with nine well-known benchmarks, showing that our smoothing technique improves EDP results of offline and online state-of-the-art DCT techniques by up to 93% and 89% (both 22% on mean), respectively.