Acceleration of variance of color differences-based demosaicing using CUDA

2012 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2012-07-02 DOI:10.1109/HPCSim.2012.6266965

Muhammad Ismail Faruqi, Fumihiko Ino, K. Hagihara

{"title":"Acceleration of variance of color differences-based demosaicing using CUDA","authors":"Muhammad Ismail Faruqi, Fumihiko Ino, K. Hagihara","doi":"10.1109/HPCSim.2012.6266965","DOIUrl":null,"url":null,"abstract":"Image demosaicing algorithms are used to reconstruct a full color image from the incomplete color samples output (RAW data) of an image sensor overlaid with a Color Filter Array (CFA). Better demosaicing algorithms are superior in terms of acuity, dynamic range, signal to noise ratio, and artifact suppression, which make them suitable for high quality delivery such as theatrical broadcast. In this paper, we present our efforts in examining the feasibility of exploiting the Graphics Processing Unit (GPU) as an emerging accelerator to create an on-the-fly implementation of Variance of Color Differences (VCD) demosaicing, a state-of-the-art heuristic demosaicing algorithm developed to eliminate false-color artifacts in texture region of images. Our efforts in this paper are 1) implementing the algorithm as several kernels to separate the bottleneck portion of the algorithm from the rest and to minimize idle threads and 2) reducing I/O between shared and global memory when performing green channel interpolation by separating the input RAW data into four channels. We then compare the implementation featuring both acceleration methods with a single kernel implementation. Based on experimental results, our proposed acceleration methods achieved per-frame processing time of 343 ms on an nVidia GTX 480, which translates into 2.95 fps. Additionally, our proposed methods were also able to accelerate the kernel time and the effective memory bandwidth by a factor of 2.1× compared with its single kernel counterpart.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2012.6266965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Image demosaicing algorithms are used to reconstruct a full color image from the incomplete color samples output (RAW data) of an image sensor overlaid with a Color Filter Array (CFA). Better demosaicing algorithms are superior in terms of acuity, dynamic range, signal to noise ratio, and artifact suppression, which make them suitable for high quality delivery such as theatrical broadcast. In this paper, we present our efforts in examining the feasibility of exploiting the Graphics Processing Unit (GPU) as an emerging accelerator to create an on-the-fly implementation of Variance of Color Differences (VCD) demosaicing, a state-of-the-art heuristic demosaicing algorithm developed to eliminate false-color artifacts in texture region of images. Our efforts in this paper are 1) implementing the algorithm as several kernels to separate the bottleneck portion of the algorithm from the rest and to minimize idle threads and 2) reducing I/O between shared and global memory when performing green channel interpolation by separating the input RAW data into four channels. We then compare the implementation featuring both acceleration methods with a single kernel implementation. Based on experimental results, our proposed acceleration methods achieved per-frame processing time of 343 ms on an nVidia GTX 480, which translates into 2.95 fps. Additionally, our proposed methods were also able to accelerate the kernel time and the effective memory bandwidth by a factor of 2.1× compared with its single kernel counterpart.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用CUDA加速基于色差的反马赛克方差

图像去马赛克算法用于从图像传感器的不完整颜色样本输出(RAW数据)中重建全彩色图像，该图像传感器与颜色滤波器阵列(CFA)叠加。更好的去马赛克算法在清晰度、动态范围、信噪比和伪影抑制等方面都具有优势，适用于戏剧广播等高质量的传输。在本文中，我们展示了我们在研究利用图形处理单元(GPU)作为新兴加速器来创建动态实现色差方差(VCD)去马赛克的可行性方面的努力，这是一种最先进的启发式去马赛克算法，用于消除图像纹理区域的假色伪影。我们在本文中的努力是:1)将算法实现为几个内核，以将算法的瓶颈部分与其余部分分开，并最大限度地减少空闲线程;2)在执行绿色通道插值时，通过将输入原始数据分离到四个通道，减少共享内存和全局内存之间的I/O。然后，我们将具有两种加速方法的实现与单个内核实现进行比较。基于实验结果，我们提出的加速方法在nVidia GTX 480上实现了每帧343 ms的处理时间，转换为2.95 fps。此外，我们提出的方法还能够将内核时间和有效内存带宽加快2.1倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量