{"title":"基于模型的输入自适应矢量化","authors":"Kirshanthan Sundararajah, Sanath Jayasena","doi":"10.1109/MERCON.2016.7480117","DOIUrl":null,"url":null,"abstract":"In a program, not all the bits of a variable are always used during execution. Identifying the minimum number of bits necessary to represent a variable in a program can potentially provide optimization opportunities. Providing the knowledge of bitwidths to a compilation and execution framework will be advantageous if it could use that information to optimize the execution of the program, for instance, being able to select instructions for SIMD vectorization. This paper introduces a framework to exploit the potential vectorizations hidden in a program which is not exposed during static compilation time. Our framework unlocks instruction level data parallelism by using the bitwidths of array like variables that depend on runtime input. Our framework shows a maximum achievable performance gain of 37% and a mean achievable performance gain of 11% against the ICC compiler on our micro benchmark suite.","PeriodicalId":184790,"journal":{"name":"2016 Moratuwa Engineering Research Conference (MERCon)","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model-based input-adaptive vectorization\",\"authors\":\"Kirshanthan Sundararajah, Sanath Jayasena\",\"doi\":\"10.1109/MERCON.2016.7480117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a program, not all the bits of a variable are always used during execution. Identifying the minimum number of bits necessary to represent a variable in a program can potentially provide optimization opportunities. Providing the knowledge of bitwidths to a compilation and execution framework will be advantageous if it could use that information to optimize the execution of the program, for instance, being able to select instructions for SIMD vectorization. This paper introduces a framework to exploit the potential vectorizations hidden in a program which is not exposed during static compilation time. Our framework unlocks instruction level data parallelism by using the bitwidths of array like variables that depend on runtime input. Our framework shows a maximum achievable performance gain of 37% and a mean achievable performance gain of 11% against the ICC compiler on our micro benchmark suite.\",\"PeriodicalId\":184790,\"journal\":{\"name\":\"2016 Moratuwa Engineering Research Conference (MERCon)\",\"volume\":\"264 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Moratuwa Engineering Research Conference (MERCon)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MERCON.2016.7480117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Moratuwa Engineering Research Conference (MERCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MERCON.2016.7480117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In a program, not all the bits of a variable are always used during execution. Identifying the minimum number of bits necessary to represent a variable in a program can potentially provide optimization opportunities. Providing the knowledge of bitwidths to a compilation and execution framework will be advantageous if it could use that information to optimize the execution of the program, for instance, being able to select instructions for SIMD vectorization. This paper introduces a framework to exploit the potential vectorizations hidden in a program which is not exposed during static compilation time. Our framework unlocks instruction level data parallelism by using the bitwidths of array like variables that depend on runtime input. Our framework shows a maximum achievable performance gain of 37% and a mean achievable performance gain of 11% against the ICC compiler on our micro benchmark suite.