Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-03 DOI:10.1109/TCSVT.2024.3453965

Pengxiang Li;Chengtang Yao;Yunde Jia;Yuwei Wu

{"title":"Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching","authors":"Pengxiang Li;Chengtang Yao;Yunde Jia;Yuwei Wu","doi":"10.1109/TCSVT.2024.3453965","DOIUrl":null,"url":null,"abstract":"Stereo matching aims to estimate 3D geometry by computing disparity from a rectified image pair. Most deep learning based stereo matching methods aggregate multi-scale cost volumes computed by downsampling and achieve good performance. However, their effectiveness in fine-grained areas is limited by significant detail loss during downsampling and the use of fixed weights in upsampling. In this paper, we propose an inter-scale similarity-guided cost aggregation method that dynamically upsamples the cost volumes according to the content of images for stereo matching. The method consists of two modules: inter-scale similarity measurement and stereo-content-aware cost aggregation. Specifically, we use inter-scale similarity measurement to generate similarity guidance from feature maps in adjacent scales. The guidance, generated from both reference and target images, is then used to aggregate the cost volumes from low-resolution to high-resolution via stereo-content-aware cost aggregation. We further split the 3D aggregation into 1D disparity and 2D spatial aggregation to reduce the computational cost. Experimental results on various benchmarks (e.g., SceneFlow, KITTI, Middlebury and ETH3D-two-view) show that our method achieves consistent performance gain on multiple models (e.g., PSM-Net, HSM-Net, CF-Net, FastAcv, and FactAcvPlus). The code can be found at <uri>https://github.com/Pengxiang-Li/issga-stereo</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"134-147"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10663688/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Stereo matching aims to estimate 3D geometry by computing disparity from a rectified image pair. Most deep learning based stereo matching methods aggregate multi-scale cost volumes computed by downsampling and achieve good performance. However, their effectiveness in fine-grained areas is limited by significant detail loss during downsampling and the use of fixed weights in upsampling. In this paper, we propose an inter-scale similarity-guided cost aggregation method that dynamically upsamples the cost volumes according to the content of images for stereo matching. The method consists of two modules: inter-scale similarity measurement and stereo-content-aware cost aggregation. Specifically, we use inter-scale similarity measurement to generate similarity guidance from feature maps in adjacent scales. The guidance, generated from both reference and target images, is then used to aggregate the cost volumes from low-resolution to high-resolution via stereo-content-aware cost aggregation. We further split the 3D aggregation into 1D disparity and 2D spatial aggregation to reduce the computational cost. Experimental results on various benchmarks (e.g., SceneFlow, KITTI, Middlebury and ETH3D-two-view) show that our method achieves consistent performance gain on multiple models (e.g., PSM-Net, HSM-Net, CF-Net, FastAcv, and FactAcvPlus). The code can be found at https://github.com/Pengxiang-Li/issga-stereo.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

尺度间相似性指导下的立体匹配成本聚合

立体匹配的目的是通过计算校正后图像对的视差来估计三维几何形状。大多数基于深度学习的立体匹配方法，通过下采样计算得到的多尺度代价体积聚集在一起，取得了较好的效果。然而，它们在细粒度区域的有效性受到下采样期间大量细节丢失和上采样中使用固定权重的限制。本文提出了一种基于尺度间相似性引导的代价聚合方法，该方法根据图像的内容动态提升代价体积进行立体匹配。该方法包括尺度间相似性度量和立体内容感知成本聚合两个模块。具体来说，我们使用尺度间相似性度量来从相邻尺度的特征映射中生成相似性指导。从参考图像和目标图像生成的指导，然后通过立体内容感知成本聚合，将成本量从低分辨率聚合到高分辨率。我们进一步将三维聚合分解为一维视差和二维空间聚合，以减少计算成本。在各种基准测试（例如，SceneFlow， KITTI， Middlebury和ETH3D-two-view）上的实验结果表明，我们的方法在多个模型（例如，PSM-Net, HSM-Net, CF-Net， FastAcv和FactAcvPlus）上实现了一致的性能增益。代码可以在https://github.com/Pengxiang-Li/issga-stereo上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.

期刊最新文献

IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information 2025 Index IEEE Transactions on Circuits and Systems for Video Technology IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information