{"title":"Auto-tuning TRSM with an asynchronous task assignment model on multicore, multi-GPU and coprocessor systems","authors":"Clícia Pinto, Marcos E. Barreto, M. Boratto","doi":"10.1109/AICCSA.2016.7945637","DOIUrl":null,"url":null,"abstract":"The increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallel algorithms that efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi-GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.","PeriodicalId":448329,"journal":{"name":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2016.7945637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallel algorithms that efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi-GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.