{"title":"基于上下文感知变换器的多视图立体重建","authors":"Zhaoxu Tian","doi":"10.1117/12.3032052","DOIUrl":null,"url":null,"abstract":"This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.","PeriodicalId":198425,"journal":{"name":"Other Conferences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiview stereo reconstruction based on context-aware transformer\",\"authors\":\"Zhaoxu Tian\",\"doi\":\"10.1117/12.3032052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.\",\"PeriodicalId\":198425,\"journal\":{\"name\":\"Other Conferences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Other Conferences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3032052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Other Conferences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiview stereo reconstruction based on context-aware transformer
This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.