{"title":"Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing","authors":"Richard M. Yoo, C. Hughes, K. Lai, Ravi Rajwar","doi":"10.1145/2503210.2503232","DOIUrl":null,"url":null,"abstract":"Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"277","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2503210.2503232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 277
Abstract
Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.