Jiang Xiang, Ziyun Li, Hun-Seok Kim, C. Chakrabarti
{"title":"Hardware-Efficient Neighbor-Guided SGM Optical Flow for Low Power Vision Applications","authors":"Jiang Xiang, Ziyun Li, Hun-Seok Kim, C. Chakrabarti","doi":"10.1109/SiPS.2016.8","DOIUrl":null,"url":null,"abstract":"Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory bandwidth requirements. This paper presents a parallel block-based optical flow algorithm along with an optimized multicore hardware architecture. The algorithm is based on neighbor-guided semi-global matching (NG-fSGM), a dynamic programming algorithm that aggressively prunes search space using flow vector information of the neighboring pixels. In the block based NG-fSGM, the image is divided into overlapping blocks and the blocks are processed in parallel for high throughput. While large overlap between blocks improves the accuracy, it results in larger memory and higher computational complexity. To minimize the amount of overlap among blocks with minimal effect on the accuracy, we use temporal prediction to guide flow vectors along the block boundaries. A pseudo-random flow candidate selection technique is also introduced to reduce memory access bandwidth and computation requirements. The proposed algorithm is mapped onto a multicore architecture where each core has a high degree of internal parallelism and implements a prefetching technique to improve throughput and reduce memory latency. The proposed hardware-efficient algorithm and the corresponding architecture achieve significant gains in throughput, latency, and power efficiency with only 1.25% accuracy degradation compared to the original NG-fSGM when evaluated on the Middlebury dataset.","PeriodicalId":370025,"journal":{"name":"2016 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"330 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS.2016.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory bandwidth requirements. This paper presents a parallel block-based optical flow algorithm along with an optimized multicore hardware architecture. The algorithm is based on neighbor-guided semi-global matching (NG-fSGM), a dynamic programming algorithm that aggressively prunes search space using flow vector information of the neighboring pixels. In the block based NG-fSGM, the image is divided into overlapping blocks and the blocks are processed in parallel for high throughput. While large overlap between blocks improves the accuracy, it results in larger memory and higher computational complexity. To minimize the amount of overlap among blocks with minimal effect on the accuracy, we use temporal prediction to guide flow vectors along the block boundaries. A pseudo-random flow candidate selection technique is also introduced to reduce memory access bandwidth and computation requirements. The proposed algorithm is mapped onto a multicore architecture where each core has a high degree of internal parallelism and implements a prefetching technique to improve throughput and reduce memory latency. The proposed hardware-efficient algorithm and the corresponding architecture achieve significant gains in throughput, latency, and power efficiency with only 1.25% accuracy degradation compared to the original NG-fSGM when evaluated on the Middlebury dataset.