{"title":"使用NUMA-BTLP和NUMA-BTDM静态算法进行线程分类和线程类型感知映射,通过平衡数据局域性提高运行时性能和能耗","authors":"Iulia Ştirb","doi":"10.1504/ijcse.2020.10029352","DOIUrl":null,"url":null,"abstract":"Extending compilers like LLVM with NUMA-aware optimisations significantly improves runtime performance and energy consumption on NUMA systems. The paper presents NUMA-BTDM algorithm, which is a compile-time thread-type dependent mapping algorithm that performs the mapping uniformly based on the type of each thread given by NUMA-BTLP algorithm following a static analysis on the code. First, the compiler inserts in the program code architecture dependent code that detects at runtime the characteristics of the underlying architecture for Intel processors, and then the mapping is performed at runtime (using specific functions calls from the PThreads library) depending on these characteristics following a compile-time mapping analysis which gives the CPU affinity of each thread. NUMA-BTDM allows the application to customise, control and optimise the thread mapping and achieves balanced data locality on NUMA systems for C parallel code that combine PThreads based task parallelism with OpenMP based loop parallelism.","PeriodicalId":340410,"journal":{"name":"Int. J. Comput. Sci. Eng.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving runtime performance and energy consumption through balanced data locality with NUMA-BTLP and NUMA-BTDM static algorithms for thread classification and thread type-aware mapping\",\"authors\":\"Iulia Ştirb\",\"doi\":\"10.1504/ijcse.2020.10029352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extending compilers like LLVM with NUMA-aware optimisations significantly improves runtime performance and energy consumption on NUMA systems. The paper presents NUMA-BTDM algorithm, which is a compile-time thread-type dependent mapping algorithm that performs the mapping uniformly based on the type of each thread given by NUMA-BTLP algorithm following a static analysis on the code. First, the compiler inserts in the program code architecture dependent code that detects at runtime the characteristics of the underlying architecture for Intel processors, and then the mapping is performed at runtime (using specific functions calls from the PThreads library) depending on these characteristics following a compile-time mapping analysis which gives the CPU affinity of each thread. NUMA-BTDM allows the application to customise, control and optimise the thread mapping and achieves balanced data locality on NUMA systems for C parallel code that combine PThreads based task parallelism with OpenMP based loop parallelism.\",\"PeriodicalId\":340410,\"journal\":{\"name\":\"Int. J. Comput. Sci. Eng.\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Sci. Eng.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijcse.2020.10029352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Sci. Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcse.2020.10029352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving runtime performance and energy consumption through balanced data locality with NUMA-BTLP and NUMA-BTDM static algorithms for thread classification and thread type-aware mapping
Extending compilers like LLVM with NUMA-aware optimisations significantly improves runtime performance and energy consumption on NUMA systems. The paper presents NUMA-BTDM algorithm, which is a compile-time thread-type dependent mapping algorithm that performs the mapping uniformly based on the type of each thread given by NUMA-BTLP algorithm following a static analysis on the code. First, the compiler inserts in the program code architecture dependent code that detects at runtime the characteristics of the underlying architecture for Intel processors, and then the mapping is performed at runtime (using specific functions calls from the PThreads library) depending on these characteristics following a compile-time mapping analysis which gives the CPU affinity of each thread. NUMA-BTDM allows the application to customise, control and optimise the thread mapping and achieves balanced data locality on NUMA systems for C parallel code that combine PThreads based task parallelism with OpenMP based loop parallelism.