{"title":"H.264运动估计算法在VLIW可编程数字信号处理器上的实现","authors":"Hyuchang Im, Wonchul Lee, Wonyong Sung","doi":"10.1109/SIPS.2005.1579883","DOIUrl":null,"url":null,"abstract":"We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (very long instruction word) SIMD (single instruction multiple data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (sum of absolute difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multi-block SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of ME for D1 size (720*480) video is possible using a 720 MHz TMS320C6416 digital signal processor.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Implementation of an H.264 motion estimation algorithm on a VLIW programmable digital signal processor\",\"authors\":\"Hyuchang Im, Wonchul Lee, Wonyong Sung\",\"doi\":\"10.1109/SIPS.2005.1579883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (very long instruction word) SIMD (single instruction multiple data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (sum of absolute difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multi-block SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of ME for D1 size (720*480) video is possible using a 720 MHz TMS320C6416 digital signal processor.\",\"PeriodicalId\":436123,\"journal\":{\"name\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIPS.2005.1579883\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIPS.2005.1579883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of an H.264 motion estimation algorithm on a VLIW programmable digital signal processor
We studied the efficient implementation of a motion estimation algorithm for H.264/AVC on TMS 320C64x, a VLIW (very long instruction word) SIMD (single instruction multiple data) digital signal processor. H.264 motion estimation algorithms demand much arithmetic operations especially because of the variable block size optimization. The SAD (sum of absolute difference) reuse method is chosen not only to reduce the computation but also to utilize the regular algorithmic structure, which is essential for efficient implementation in parallel and pipelined processors. We applied a few techniques, such as loop length increase for efficient software pipelining, multi-block SAD computation for reducing memory access overhead, block processing for cache miss minimization, and improved quarter-pixel processing. The implementation results show that a real-time implementation of ME for D1 size (720*480) video is possible using a 720 MHz TMS320C6416 digital signal processor.