{"title":"HSLOT: The HERCULES Scriptable Loop Transformations Engine","authors":"Christos Kartsaklis, Eunjung Park, John Cavazos","doi":"10.1109/WOLFHPC.2014.10","DOIUrl":null,"url":null,"abstract":"HSLOT arms users with a rich set of configurable transformation directives, to be used as-they-are or to be specialized and combined into powerful custom transformations. We offer a plethora of loop transformations, which includes both the classic set (unroll, fuse, fission, tile, and so on) as well as unique ones (specialize, swap nest, split, fork, and so on) that are not found in other state-of-the-art systems. We show how HSLOT enables more transformations such as merging two loops that cannot be fused because of data dependencies and how HSLOT can be used in a simple and systematic fashion to improve memory accesses and expose better parallelism. To use our system, users simply annotate loops with the transformations sequence and compile with our Open64-based HSLOTimplementing Fortran compiler, HSLF90, which produces both object files and optionally source. We describe our experiment results using a set of scientific kernels written in Fortran with HSLOT directives on AMD 32 core system.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"40 1","pages":"31-41"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"高性能计算技术","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/WOLFHPC.2014.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
HSLOT arms users with a rich set of configurable transformation directives, to be used as-they-are or to be specialized and combined into powerful custom transformations. We offer a plethora of loop transformations, which includes both the classic set (unroll, fuse, fission, tile, and so on) as well as unique ones (specialize, swap nest, split, fork, and so on) that are not found in other state-of-the-art systems. We show how HSLOT enables more transformations such as merging two loops that cannot be fused because of data dependencies and how HSLOT can be used in a simple and systematic fashion to improve memory accesses and expose better parallelism. To use our system, users simply annotate loops with the transformations sequence and compile with our Open64-based HSLOTimplementing Fortran compiler, HSLF90, which produces both object files and optionally source. We describe our experiment results using a set of scientific kernels written in Fortran with HSLOT directives on AMD 32 core system.