32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application

Zhenyu Liu, Yang Song, Ming Shao, Shen Li, Lingfeng Li, S. Goto, T. Ikenaga
{"title":"32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application","authors":"Zhenyu Liu, Yang Song, Ming Shao, Shen Li, Lingfeng Li, S. Goto, T. Ikenaga","doi":"10.1109/SIPS.2007.4387630","DOIUrl":null,"url":null,"abstract":"H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 × 128 search range for HDTVl080p@30Hz is demonstrated in this paper. With TSMC 0.18¿m 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.","PeriodicalId":93225,"journal":{"name":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","volume":"28 1","pages":"675-680"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Workshop on Signal Processing Systems (2007-2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIPS.2007.4387630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

H.264/AVC coding standard incorporates variable block size (VBS) motion estimation (ME) to improve the compression efficiency. For HDTV-1080p application, the massive computation and huge memory bandwidth by the large video frame size and the wide search range are two critical impediments to the real-time hardwired VB-SME engine design. In this paper, we present six techniques to circumvent these difficulties. First, the inter modes bellow 8 × 8 are eliminated in our design to reduce the hardware cost. Second, the low-pass filter based 4:1 down-sampling algorithm successfully reduces about 75% arithmetic computation in each search position. Third, the coarse to fine search scheme is made use of to reduce 25%-50% search candidates. Fourth, C+ memory organization is adopted to reduce the external IO bandwidth. Fifth, horizontal zigzag scan mode optimizes the search window memories. Finally, in circuit design, 4:2 compressor based CSA tree, multi-cycle path delay and 2 pipeline stage SAD tree techniques are utilized to improve the speed and reduce the hardware of each SAD tree. The hardwired integer motion estimation (IME) engine with 192 × 128 search range for HDTVl080p@30Hz is demonstrated in this paper. With TSMC 0.18¿m 1P6M CMOS technology, it is implemented with 485.7k gates standard cells and 327.68k bit on chip memories. The power dissipation is 729mw at 200MHz clock speed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于HDTV1080P实时编码中可变块大小运动估计的32并行SAD树硬连线引擎
H.264/AVC编码标准引入了可变块大小(VBS)运动估计(ME),提高了压缩效率。在HDTV-1080p应用中,大视频帧大小和大搜索范围所带来的庞大计算量和巨大的内存带宽是实时硬连线VB-SME引擎设计的两个关键障碍。在本文中,我们提出了六种技术来克服这些困难。首先,我们的设计消除了8 × 8以下的交互模式,以降低硬件成本。其次,基于低通滤波器的4:1降采样算法成功地减少了每个搜索位置约75%的算术计算。第三,采用粗到细的搜索方案,将候选搜索对象减少25% ~ 50%。第四,采用c++内存组织,减少外部IO带宽。第五,水平之字形扫描模式优化了搜索窗口存储器。最后,在电路设计中,采用了基于4:2压缩机的CSA树、多周期路径延迟和2管道级SAD树技术,提高了速度,减少了每个SAD树的硬件。本文演示了一种192 × 128搜索范围的硬连线整数运动估计引擎(IME)。采用台积电0.18¿m 1P6M CMOS技术,采用485.7k栅极标准单元和327.68k位片上存储器。在200MHz时钟速度下,功耗为729mw。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Real-Time Estimation of Direction of Arrival of Speech Source Using Three Microphones. Optimization of Calibration Algorithms on a Manycore Embedded Platform A signal denoising technique based on wavelets modulus maxima lines and a self-scalable grid classifier Spectral Management of Multiple Wireless Signals Based Cognitive Radio Synthesizing hardware from dataflow programs: An MPEG-4 simple profile decoder case study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1