Tiago Knorst, Guilherme Korol, M. Jordan, J. Vicenzi, A. Lorenzon, M. B. Rutzig, A. C. S. Beck
{"title":"CPU-FPGA环境下协同线程节流和hls -版本控制的好处","authors":"Tiago Knorst, Guilherme Korol, M. Jordan, J. Vicenzi, A. Lorenzon, M. B. Rutzig, A. C. S. Beck","doi":"10.1109/SBCCI55532.2022.9893223","DOIUrl":null,"url":null,"abstract":"Cloud Environments have been constantly adopting collaborative CPU-FPGA architectures to accelerate applications by partitioning the execution of their kernels across both devices. However, exploiting the optimization techniques that both archi-tectures offer is challenging, so they must be smartly employed depending on the application at hand and the target optimization (e.g., performance or energy). Given that, this work investigates the impact of collaboratively applying thread throttling (i.e. artificially decreasing the number of active threads) on the CPU side and HLS (High-Level Synthesis)-versioning on the FPGA side. We use a multi-tenant Cloud service as our object of study, where sequence of application requests with different priorities result in DAGs of application kernels that must be executed over the heterogeneous architecture. We show that by synergistically applying thread throttling and HLS-versioning to the incoming kernels may improve the Energy-Dealy product in up to 41x over the default and non-optimized execution.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the benefits of Collaborative Thread Throttling and HLS-Versioning in CPU-FPGA Environments\",\"authors\":\"Tiago Knorst, Guilherme Korol, M. Jordan, J. Vicenzi, A. Lorenzon, M. B. Rutzig, A. C. S. Beck\",\"doi\":\"10.1109/SBCCI55532.2022.9893223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud Environments have been constantly adopting collaborative CPU-FPGA architectures to accelerate applications by partitioning the execution of their kernels across both devices. However, exploiting the optimization techniques that both archi-tectures offer is challenging, so they must be smartly employed depending on the application at hand and the target optimization (e.g., performance or energy). Given that, this work investigates the impact of collaboratively applying thread throttling (i.e. artificially decreasing the number of active threads) on the CPU side and HLS (High-Level Synthesis)-versioning on the FPGA side. We use a multi-tenant Cloud service as our object of study, where sequence of application requests with different priorities result in DAGs of application kernels that must be executed over the heterogeneous architecture. We show that by synergistically applying thread throttling and HLS-versioning to the incoming kernels may improve the Energy-Dealy product in up to 41x over the default and non-optimized execution.\",\"PeriodicalId\":231587,\"journal\":{\"name\":\"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBCCI55532.2022.9893223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBCCI55532.2022.9893223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the benefits of Collaborative Thread Throttling and HLS-Versioning in CPU-FPGA Environments
Cloud Environments have been constantly adopting collaborative CPU-FPGA architectures to accelerate applications by partitioning the execution of their kernels across both devices. However, exploiting the optimization techniques that both archi-tectures offer is challenging, so they must be smartly employed depending on the application at hand and the target optimization (e.g., performance or energy). Given that, this work investigates the impact of collaboratively applying thread throttling (i.e. artificially decreasing the number of active threads) on the CPU side and HLS (High-Level Synthesis)-versioning on the FPGA side. We use a multi-tenant Cloud service as our object of study, where sequence of application requests with different priorities result in DAGs of application kernels that must be executed over the heterogeneous architecture. We show that by synergistically applying thread throttling and HLS-versioning to the incoming kernels may improve the Energy-Dealy product in up to 41x over the default and non-optimized execution.