K. Messaoudi, E. Bourennane, S. Toumi, Gilberto Ochoa
{"title":"Performance comparison of two hardware implementations of the deblocking filter used in H.264 by changing the utilized data width","authors":"K. Messaoudi, E. Bourennane, S. Toumi, Gilberto Ochoa","doi":"10.1109/WOSSPA.2011.5931411","DOIUrl":null,"url":null,"abstract":"The deblocking filter is more complex than other modules in the H.264 because it is highly adaptive, applied to each boundary of all 4×4 blocks and updated three pixels in each direction. After careful study and analysis of this filter, we have concluded that its complexity lies in the data dependency and in the control module of elementary filters that compose it, but not in the type of these filters. In this paper, we propose two hardware implementations for the deblocking filter using the same strategy for memory management. The implementations differ on the utilized data width. The first one utilizes 32-bit while the second one utilizes 128-bit. The use of 128-bit data width is in order to ensure a high degree of parallelism and to avoid the use of transpose circuits and the intermediate buffers between the elementary modules in the filter. Simulation and synthesis results are then compared. The number of consumed LUTs remains almost the same compared with previous implementations and the number of clock cycles required to process a macroblock is reduced about 40% less than the best of the competing proposals.","PeriodicalId":343415,"journal":{"name":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Systems, Signal Processing and their Applications, WOSSPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOSSPA.2011.5931411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The deblocking filter is more complex than other modules in the H.264 because it is highly adaptive, applied to each boundary of all 4×4 blocks and updated three pixels in each direction. After careful study and analysis of this filter, we have concluded that its complexity lies in the data dependency and in the control module of elementary filters that compose it, but not in the type of these filters. In this paper, we propose two hardware implementations for the deblocking filter using the same strategy for memory management. The implementations differ on the utilized data width. The first one utilizes 32-bit while the second one utilizes 128-bit. The use of 128-bit data width is in order to ensure a high degree of parallelism and to avoid the use of transpose circuits and the intermediate buffers between the elementary modules in the filter. Simulation and synthesis results are then compared. The number of consumed LUTs remains almost the same compared with previous implementations and the number of clock cycles required to process a macroblock is reduced about 40% less than the best of the competing proposals.