{"title":"Algorithm for Cooperative CPU-GPU Computing","authors":"Razvan-Mihai Aciu, H. Ciocarlie","doi":"10.1109/SYNASC.2013.53","DOIUrl":null,"url":null,"abstract":"Many applications have modules which could benefit greatly from the massive parallel numeric computing power provided by GPUs. Renderers, signal processing or simulators are only a few such applications. Due to the weaknesses of the GPUs such as stackless execution model or poor capabilities for pointer exchange with the host, sometimes is not feasible to convert an entire algorithm for GPU, even if it is highly parallel and some of its parts can be greatly accelerated on GPU. In such situations a programmer should have a framework which allows him to split the code flow of a thread in parts and each of these parts will run on the most suitable computing resource, CPU or GPU. For GPU execution, multiple data from host threads will be collected, run on GPU and the results returned to the original threads so they will be able to resume execution on host. In this paper we propose such an algorithm, analyze it and evaluate its practical results.","PeriodicalId":293085,"journal":{"name":"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2013.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Many applications have modules which could benefit greatly from the massive parallel numeric computing power provided by GPUs. Renderers, signal processing or simulators are only a few such applications. Due to the weaknesses of the GPUs such as stackless execution model or poor capabilities for pointer exchange with the host, sometimes is not feasible to convert an entire algorithm for GPU, even if it is highly parallel and some of its parts can be greatly accelerated on GPU. In such situations a programmer should have a framework which allows him to split the code flow of a thread in parts and each of these parts will run on the most suitable computing resource, CPU or GPU. For GPU execution, multiple data from host threads will be collected, run on GPU and the results returned to the original threads so they will be able to resume execution on host. In this paper we propose such an algorithm, analyze it and evaluate its practical results.