Bart Pieters, Charles-Frederik Hollemeersch, J. D. Cock, W. D. Neve, P. Lambert, R. Walle
{"title":"Parallel deblocking filtering in H.264/AVC using multiple CPUs and GPUs","authors":"Bart Pieters, Charles-Frederik Hollemeersch, J. D. Cock, W. D. Neve, P. Lambert, R. Walle","doi":"10.1145/2393347.2396370","DOIUrl":null,"url":null,"abstract":"Deblocking filtering in the H.264/AVC standard is a computationally complex process because of the filter's high content adaptivity. Furthermore, the deblocking filter introduces a significant number of data dependencies, making parallel processing not obvious. Our previous works analyzed the dependencies of the filter and proposed a massively-parallel implementation, specifically tailored for execution on a single GPU. In this paper, we extend this work by proposing a parallel processing scheme for accelerating deblocking filtering using multiple CPU cores or GPUs. This scheme allows for standard-compliant filtering, regardless of slice configuration. Results show that our multi-GPU implementation using our proposed scheme achieves faster-than real-time deblocking at over 3794 frames per second for 1080p video pictures by using three GPUs. A multi-core CPU implementation using 8 CPU cores allows 1080p deblocking filtering of up to 695 frames per second.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2393347.2396370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Deblocking filtering in the H.264/AVC standard is a computationally complex process because of the filter's high content adaptivity. Furthermore, the deblocking filter introduces a significant number of data dependencies, making parallel processing not obvious. Our previous works analyzed the dependencies of the filter and proposed a massively-parallel implementation, specifically tailored for execution on a single GPU. In this paper, we extend this work by proposing a parallel processing scheme for accelerating deblocking filtering using multiple CPU cores or GPUs. This scheme allows for standard-compliant filtering, regardless of slice configuration. Results show that our multi-GPU implementation using our proposed scheme achieves faster-than real-time deblocking at over 3794 frames per second for 1080p video pictures by using three GPUs. A multi-core CPU implementation using 8 CPU cores allows 1080p deblocking filtering of up to 695 frames per second.