{"title":"A High-Throughput and Memory-Efficient Deblocking Filter Hardware Architecture for VVC","authors":"Bingjing Hou;Leilei Huang;Minge Jing;Yibo Fan","doi":"10.1109/TCSVT.2024.3447698","DOIUrl":null,"url":null,"abstract":"Video coding has become more and more important since high-resolution and high-quality videos have been used in a variety of application areas. Deblocking filter (DBF) is a video coding technology which can improve both video quality and coding efficiency. However, its hardware architecture design suffers from huge computations and high memory requirements. Moreover, the latest Versatile Video Coding (VVC) standard extends DBF with several complex enhancements, which makes the design more difficult. In this paper, a high-throughput and memory-efficient DBF hardware architecture for VVC systems is presented. By analyz-ing the DBF algorithm, we firstly propose a unified filter core to perform edge filtering process with low complexity, and two resource sharing techniques are utilized to reduce hardware costs. Furthermore, we propose a whole DBF architecture to process all the edges in a coding tree unit (CTU). To improve its throughput, we propose novel pre-calculation processing flow and double processing flow to fully utilize pipelining and parallel processing techniques. At the same time, to reduce its memory requirements, we propose four novel data reuse approaches to fully utilize intermediate data reusabilities. Synthesis results show that our proposed hardware architecture can support real-time VVC DBF processing of \n<inline-formula> <tex-math>$7680\\times 4320$ </tex-math></inline-formula>\n at 158 frames/s at 500 MHz working frequency. The hardware costs are only 163.2k gate count and three two-port on-chip SRAMs with data width of 128 bits and depth of 32. Compared with other state-of-the-art works for previous standards, our proposed VVC DBF hardware architecture achieves good results in performance, area efficiency and memory efficiency.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13569-13583"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10643606/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Video coding has become more and more important since high-resolution and high-quality videos have been used in a variety of application areas. Deblocking filter (DBF) is a video coding technology which can improve both video quality and coding efficiency. However, its hardware architecture design suffers from huge computations and high memory requirements. Moreover, the latest Versatile Video Coding (VVC) standard extends DBF with several complex enhancements, which makes the design more difficult. In this paper, a high-throughput and memory-efficient DBF hardware architecture for VVC systems is presented. By analyz-ing the DBF algorithm, we firstly propose a unified filter core to perform edge filtering process with low complexity, and two resource sharing techniques are utilized to reduce hardware costs. Furthermore, we propose a whole DBF architecture to process all the edges in a coding tree unit (CTU). To improve its throughput, we propose novel pre-calculation processing flow and double processing flow to fully utilize pipelining and parallel processing techniques. At the same time, to reduce its memory requirements, we propose four novel data reuse approaches to fully utilize intermediate data reusabilities. Synthesis results show that our proposed hardware architecture can support real-time VVC DBF processing of
$7680\times 4320$
at 158 frames/s at 500 MHz working frequency. The hardware costs are only 163.2k gate count and three two-port on-chip SRAMs with data width of 128 bits and depth of 32. Compared with other state-of-the-art works for previous standards, our proposed VVC DBF hardware architecture achieves good results in performance, area efficiency and memory efficiency.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.