{"title":"An efficient parallel motion estimation algorithm and X264 parallelization in CUDA","authors":"Youngsub Ko, Youngmin Yi, S. Ha","doi":"10.1109/DASIP.2011.6136860","DOIUrl":null,"url":null,"abstract":"H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"21 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASIP.2011.6136860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.