VAT：用于细粒度服装人体重建的可视性感知变压器。

IF 6.5 IEEE transactions on visualization and computer graphics Pub Date : 2025-01-10 DOI:10.1109/TVCG.2025.3528021

Xiaoyan Zhang;Zibin Zhu;Hong Xie;Sisi Ren;Jianmin Jiang

{"title":"VAT：用于细粒度服装人体重建的可视性感知变压器。","authors":"Xiaoyan Zhang;Zibin Zhu;Hong Xie;Sisi Ren;Jianmin Jiang","doi":"10.1109/TVCG.2025.3528021","DOIUrl":null,"url":null,"abstract":"In order to reconstruct 3D clothed human with accurate fine-grained details from sparse views, we propose a deep cooperating two-level global to fine-grained reconstruction framework that constructs robust global geometry to guide fine-grained geometry learning. The core of the framework is a novel visibility aware Transformer VAT, which bridges the two-level reconstruction architecture by connecting its global encoder and fine-grained decoder with two pixel-aligned implicit functions, respectively. The global encoder fuses semantic features of multiple views to integrate global geometric features. In the fine-grained decoder, visibility aware attention mechanism is designed to efficiently fuse multi-view and multi-scale features for mining fine-grained geometric features. The global encoder and fine-grained decoder are connected by a global embeding module to form a deep cooperation in the two-level framework, which provides global geometric embedding as a query guidance for calculating visibility aware attention in the fine-grained decoder. In addition, to extract highly aligned multi-scale features for the two-level reconstruction architecture, we design an image feature extractor MSUNet, which establishes strong semantic connections between different scales at minimal cost. Our proposed framework is end-to-end trainable, with all modules jointly optimized. We validate the effectiveness of our framework on public benchmarks, and experimental results demonstrate that our method has significant advantages over state-of-the-art methods in terms of both fine-grained performance and generalization.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"31 10","pages":"6719-6736"},"PeriodicalIF":6.5000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VAT: Visibility Aware Transformer for Fine-Grained Clothed Human Reconstruction\",\"authors\":\"Xiaoyan Zhang;Zibin Zhu;Hong Xie;Sisi Ren;Jianmin Jiang\",\"doi\":\"10.1109/TVCG.2025.3528021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to reconstruct 3D clothed human with accurate fine-grained details from sparse views, we propose a deep cooperating two-level global to fine-grained reconstruction framework that constructs robust global geometry to guide fine-grained geometry learning. The core of the framework is a novel visibility aware Transformer VAT, which bridges the two-level reconstruction architecture by connecting its global encoder and fine-grained decoder with two pixel-aligned implicit functions, respectively. The global encoder fuses semantic features of multiple views to integrate global geometric features. In the fine-grained decoder, visibility aware attention mechanism is designed to efficiently fuse multi-view and multi-scale features for mining fine-grained geometric features. The global encoder and fine-grained decoder are connected by a global embeding module to form a deep cooperation in the two-level framework, which provides global geometric embedding as a query guidance for calculating visibility aware attention in the fine-grained decoder. In addition, to extract highly aligned multi-scale features for the two-level reconstruction architecture, we design an image feature extractor MSUNet, which establishes strong semantic connections between different scales at minimal cost. Our proposed framework is end-to-end trainable, with all modules jointly optimized. We validate the effectiveness of our framework on public benchmarks, and experimental results demonstrate that our method has significant advantages over state-of-the-art methods in terms of both fine-grained performance and generalization.\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"31 10\",\"pages\":\"6719-6736\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10836818/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10836818/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了从稀疏视图中获得精确的细粒度细节，我们提出了一个深度合作的两级全局到细粒度重建框架，该框架构建了鲁棒的全局几何来指导细粒度几何学习。该框架的核心是一种新颖的可视性感知的Transformer VAT，它通过将其全局编码器和细粒度解码器分别与两个像素对齐的隐式函数连接在一起，架起了两级重构体系结构的桥梁。全局编码器通过融合多个视图的语义特征来整合全局几何特征。在细粒度解码器中，设计了可见感知注意机制，有效融合多视角、多尺度特征，挖掘细粒度几何特征。全局编码器和细粒度解码器通过全局嵌入模块连接，在两级框架中形成深度合作，为细粒度解码器中可见性感知注意力的计算提供全局几何嵌入作为查询指导。此外，为了为两级重建架构提取高度对齐的多尺度特征，我们设计了一种图像特征提取器MSUNet，以最小的成本在不同尺度之间建立强语义连接。我们提出的框架是端到端可训练的，所有模块都是联合优化的。我们在公共基准测试上验证了我们的框架的有效性，实验结果表明，我们的方法在细粒度性能和泛化方面都比最先进的方法具有显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VAT: Visibility Aware Transformer for Fine-Grained Clothed Human Reconstruction

In order to reconstruct 3D clothed human with accurate fine-grained details from sparse views, we propose a deep cooperating two-level global to fine-grained reconstruction framework that constructs robust global geometry to guide fine-grained geometry learning. The core of the framework is a novel visibility aware Transformer VAT, which bridges the two-level reconstruction architecture by connecting its global encoder and fine-grained decoder with two pixel-aligned implicit functions, respectively. The global encoder fuses semantic features of multiple views to integrate global geometric features. In the fine-grained decoder, visibility aware attention mechanism is designed to efficiently fuse multi-view and multi-scale features for mining fine-grained geometric features. The global encoder and fine-grained decoder are connected by a global embeding module to form a deep cooperation in the two-level framework, which provides global geometric embedding as a query guidance for calculating visibility aware attention in the fine-grained decoder. In addition, to extract highly aligned multi-scale features for the two-level reconstruction architecture, we design an image feature extractor MSUNet, which establishes strong semantic connections between different scales at minimal cost. Our proposed framework is end-to-end trainable, with all modules jointly optimized. We validate the effectiveness of our framework on public benchmarks, and experimental results demonstrate that our method has significant advantages over state-of-the-art methods in terms of both fine-grained performance and generalization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量