{"title":"基于全搜索块匹配的紧凑型fpga收缩阵列运动估计结构","authors":"G. Saldaha, M. Arias-Estrada","doi":"10.1109/SPL.2007.371756","DOIUrl":null,"url":null,"abstract":"Motion estimation constitutes a significant computational part of video compression standards such as MPEG4. The present work focuses on the development of a reconfigurable systolic-based architecture implementing the full search block matching algorithm which is highly computing intensive and requires a large bandwidth to obtain real-time performance. The architecture comprises a smart memory scheme to reduce the number of access to image memory and router elements to handle data movement among different structures inside the same architecture, adding the possibility of chaining interconnection of multiple processing blocks. Every PE in the array includes a double ALU in order to search multiple macro-blocks in parallel. Results show that a peak performance in the order of 9 GOPS can be achieved.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Compact FPGA-Based Systolic Array Architecture for Motion Estimation Using Full Search Block Matching\",\"authors\":\"G. Saldaha, M. Arias-Estrada\",\"doi\":\"10.1109/SPL.2007.371756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion estimation constitutes a significant computational part of video compression standards such as MPEG4. The present work focuses on the development of a reconfigurable systolic-based architecture implementing the full search block matching algorithm which is highly computing intensive and requires a large bandwidth to obtain real-time performance. The architecture comprises a smart memory scheme to reduce the number of access to image memory and router elements to handle data movement among different structures inside the same architecture, adding the possibility of chaining interconnection of multiple processing blocks. Every PE in the array includes a double ALU in order to search multiple macro-blocks in parallel. Results show that a peak performance in the order of 9 GOPS can be achieved.\",\"PeriodicalId\":419253,\"journal\":{\"name\":\"2007 3rd Southern Conference on Programmable Logic\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 3rd Southern Conference on Programmable Logic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPL.2007.371756\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 3rd Southern Conference on Programmable Logic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPL.2007.371756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Compact FPGA-Based Systolic Array Architecture for Motion Estimation Using Full Search Block Matching
Motion estimation constitutes a significant computational part of video compression standards such as MPEG4. The present work focuses on the development of a reconfigurable systolic-based architecture implementing the full search block matching algorithm which is highly computing intensive and requires a large bandwidth to obtain real-time performance. The architecture comprises a smart memory scheme to reduce the number of access to image memory and router elements to handle data movement among different structures inside the same architecture, adding the possibility of chaining interconnection of multiple processing blocks. Every PE in the array includes a double ALU in order to search multiple macro-blocks in parallel. Results show that a peak performance in the order of 9 GOPS can be achieved.