{"title":"A 39pJ/label 1920x1080 165.7 FPS Block PatchMatch Based Stereo Matching Processor on FPGA","authors":"Hongyu Wang, Weiti Zhou, Xiangyu Zhang, Xin Lou","doi":"10.1109/CICC53496.2022.9772830","DOIUrl":null,"url":null,"abstract":"Depth is a fundamental information for lots of computer vision applications. Stereo matching is a commonly used depth estimation method that mimics the human binocular vision system. Most stereo matching systems suffer when attempt to strike a balance between accuracy and computational complexity because of two reasons: 1) It is usually assumed that all the surfaces are fronto-parallel, meaning that neighbors share the same disparity. But a lot of real-world situations are slant surfaces like roads and walls. 2) Conventional methods usually utilize winner takes all (WTA) strategy [1]–[4], where aggregated costs in all disparity levels (usually 128 or 256) must be calculated. But there is only one true guess for each pixel position, such that most of the calculated costs are meaningless. To solve the above issues, in this work, a block PatchMatch-based FPGA accelerator for stereo matching is proposed. PatchMatch introduces random search strategy and slant label, where rectangle superpixels called blocks are used as the basic computing element. Main improvements of this work are: 1) Utilized random search strategy and block level computation can save massive computation. 2) Closer-to-reality slant label improves accuracy. Moreover, plane slant is also helpful for following tasks like 3D reconstruction [5], but none of the existing hardware accelerators can provide this information. 3) Algorithm-hardware co-optimized 6-points Census feature and multi-scale propagation (MSP) are proposed. 4) Based on the testing results on industrial-level KITTI dataset, the real-time performance and energy efficiency of the proposed design outperform state-of-the-art FPGA and ASIC designs, with comparable accuracy.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772830","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Depth is a fundamental information for lots of computer vision applications. Stereo matching is a commonly used depth estimation method that mimics the human binocular vision system. Most stereo matching systems suffer when attempt to strike a balance between accuracy and computational complexity because of two reasons: 1) It is usually assumed that all the surfaces are fronto-parallel, meaning that neighbors share the same disparity. But a lot of real-world situations are slant surfaces like roads and walls. 2) Conventional methods usually utilize winner takes all (WTA) strategy [1]–[4], where aggregated costs in all disparity levels (usually 128 or 256) must be calculated. But there is only one true guess for each pixel position, such that most of the calculated costs are meaningless. To solve the above issues, in this work, a block PatchMatch-based FPGA accelerator for stereo matching is proposed. PatchMatch introduces random search strategy and slant label, where rectangle superpixels called blocks are used as the basic computing element. Main improvements of this work are: 1) Utilized random search strategy and block level computation can save massive computation. 2) Closer-to-reality slant label improves accuracy. Moreover, plane slant is also helpful for following tasks like 3D reconstruction [5], but none of the existing hardware accelerators can provide this information. 3) Algorithm-hardware co-optimized 6-points Census feature and multi-scale propagation (MSP) are proposed. 4) Based on the testing results on industrial-level KITTI dataset, the real-time performance and energy efficiency of the proposed design outperform state-of-the-art FPGA and ASIC designs, with comparable accuracy.