William Kolodziejski, R. Domanski, M. Porto, B. Zatt, L. Agostini
{"title":"探索近似计算的超高清AV1 FME插值体系结构","authors":"William Kolodziejski, R. Domanski, M. Porto, B. Zatt, L. Agostini","doi":"10.29292/jics.v17i2.558","DOIUrl":null,"url":null,"abstract":"Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.","PeriodicalId":39974,"journal":{"name":"Journal of Integrated Circuits and Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-High Definition AV1 FME Interpolation Architectures Exploring Approximate Computing\",\"authors\":\"William Kolodziejski, R. Domanski, M. Porto, B. Zatt, L. Agostini\",\"doi\":\"10.29292/jics.v17i2.558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.\",\"PeriodicalId\":39974,\"journal\":{\"name\":\"Journal of Integrated Circuits and Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Integrated Circuits and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29292/jics.v17i2.558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrated Circuits and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29292/jics.v17i2.558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
摘要
像amedia video 1 (AV1)这样的现代视频编码器实现了几个复杂的工具来实现所需的高压缩效率。分数运动估计(FME)是这些复杂的工具之一,AV1 FME定义了42种不同的插值滤波器。为了处理这种复杂性,使用近似计算的硬件加速已经成为一种值得探索的有趣替代方案。本文提出了AV1 FME插值滤波器的三种优化近似结构。该架构以低成本、低功耗和高效内存的设计实现了每秒30帧的UHD 4K视频实时插值。这些架构是为40nm台积电标准电池技术合成的,与精确架构相比,功率增益高达83%,与先前公布的近似解决方案相比,功率增益高达20%。面积的增长也很明显:分别高达83%和40%。与最先进的解决方案相比,该架构还允许内存带宽减少高达59.5%。近似结果表明,BD-BR编码效率下降幅度较小,分别为0.54%和1.25%。在考虑硬件成本、功耗、处理速率、内存带宽和编码效率之间的权衡时,所提出的体系结构具有文献中发现的最佳结果。
Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.
期刊介绍:
This journal will present state-of-art papers on Integrated Circuits and Systems. It is an effort of both Brazilian Microelectronics Society - SBMicro and Brazilian Computer Society - SBC to create a new scientific journal covering Process and Materials, Device and Characterization, Design, Test and CAD of Integrated Circuits and Systems. The Journal of Integrated Circuits and Systems is published through Special Issues on subjects to be defined by the Editorial Board. Special issues will publish selected papers from both Brazilian Societies annual conferences, SBCCI - Symposium on Integrated Circuits and Systems and SBMicro - Symposium on Microelectronics Technology and Devices.