Shewit W. Tesfay, Zeynep G. Demirdag, H. F. Ugurdag, H. Ateş
{"title":"Hybrid CPU-GPU Acceleration of a Multithreaded Image Stitching Algorithm","authors":"Shewit W. Tesfay, Zeynep G. Demirdag, H. F. Ugurdag, H. Ateş","doi":"10.1109/UBMK55850.2022.9919473","DOIUrl":null,"url":null,"abstract":"Real-time image stitching is critical, especially in un-manned aerial vehicles, and its acceleration has received attention in recent years. This paper describes an image stitching acceleration scheme for heterogeneous (CPU+GPU) devices. Acceleration is attempted with both multithreading and multiprocessing. Most time-critical functions in the algorithm are offloaded on to the GPU. We crafted a 3-buffer ping-pong mechanism for synchro-nization and data transfer among threads/processes in order to maximize CPU utilization. We carried out our experiments on Nvidia Jetson AGX Xavier. Results show that more than 3x acceleration is achieved.","PeriodicalId":417604,"journal":{"name":"2022 7th International Conference on Computer Science and Engineering (UBMK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK55850.2022.9919473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Real-time image stitching is critical, especially in un-manned aerial vehicles, and its acceleration has received attention in recent years. This paper describes an image stitching acceleration scheme for heterogeneous (CPU+GPU) devices. Acceleration is attempted with both multithreading and multiprocessing. Most time-critical functions in the algorithm are offloaded on to the GPU. We crafted a 3-buffer ping-pong mechanism for synchro-nization and data transfer among threads/processes in order to maximize CPU utilization. We carried out our experiments on Nvidia Jetson AGX Xavier. Results show that more than 3x acceleration is achieved.