Most learning-based Multi-View Stereo (MVS) methods focus on accurate depth inference to obtain precise and complete point clouds, where cost aggregation plays a crucial role as a bridge between two-dimensional (2D) images and three-dimensional (3D) representations. While achieving competitive results, conventional MVS methods often adopt cascaded architectures, which risk error propagation, or they neglect the optimization of depth geometry. To address this, we propose the Quadruplex-Depth based MVS Network (ETQ-MVSNet), built upon a progressive refinement framework. Our core innovation is the Quadruplex-Depth (QD) mechanism, which predicts four depth values per pixel and constrains them to form a novel wave-shaped depth geometry. This is complemented by an adaptive initial depth range determination strategy within the Quadruplex-Depth Refinement (QDR) process to reduce prediction deviation. By preemptively modelling the wave-shaped depth map within the prediction network to reduce interpolated depth deviation during the depth fusion phase, our method significantly enhances the overall coherence of the reconstruction pipeline and improves quality. To complement the QD mechanism, which involves double regularization due to its wave-shaped cells, we also incorporate an Epipolar Transformer (ET) for visibility-aware cost aggregation, capturing robust long-range 3D relationships along epipolar lines, and an efficient multi-scale informative feature extraction network that efficiently processes images collectively and extracts high-quality features for all pipeline modules in a single pass. These two designs not only balance the pipeline's reconstruction efficiency, enhancing practical utility, but also improve reconstruction quality in non-ideal scenes. ETQ-MVSNet not only surpasses all previous progressive refinement approaches but also achieves competitive results against state-of-the-art cascaded methods, demonstrating its effectiveness, time efficiency, generalization ability, and strong scalability. Our proposed method can be extended to reconstruct images captured by mobile phones or Unmanned Aerial Vehicles (UAVs) in various applications, including digital heritage conservation and city surveying. The code will be available at https://github.com/Boyang-Song/ETQ-MVSNet.
扫码关注我们
求助内容:
应助结果提醒方式:
