{"title":"Optimization of Automatic Conversion of Serial C to Parallel OpenMP","authors":"D. Dheeraj, B. Nitish, S. Ramesh","doi":"10.1109/CyberC.2012.59","DOIUrl":null,"url":null,"abstract":"This paper implements a technique that enhances parallel execution of auto-generated OpenMP programs by considering architecture of on chip cache memory. It avoids false-sharing in 'for-loops' by generating OpenMP code for dynamically scheduling chunks by placing each core's data cache line size apart. An open-source parallelization tool called Par4All has been analyzed and its power has been unleashed to achieve maximum hardware utilization. Some of the computationally intensive programs from Poly Bench have been tested on different architectures, with different data sets and the results obtained reveal that the OpenMP codes generated by the enhanced technique have resulted in considerable speedup.","PeriodicalId":416468,"journal":{"name":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2012.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper implements a technique that enhances parallel execution of auto-generated OpenMP programs by considering architecture of on chip cache memory. It avoids false-sharing in 'for-loops' by generating OpenMP code for dynamically scheduling chunks by placing each core's data cache line size apart. An open-source parallelization tool called Par4All has been analyzed and its power has been unleashed to achieve maximum hardware utilization. Some of the computationally intensive programs from Poly Bench have been tested on different architectures, with different data sets and the results obtained reveal that the OpenMP codes generated by the enhanced technique have resulted in considerable speedup.