{"title":"为无监督室内深度估计自适应学习领域不变特征","authors":"Jiehua Zhang, Liang Li, Chenggang Yan, Zhan Wang, Changliang Xu, Jiyong Zhang, Chuqiao Chen","doi":"10.1145/3672397","DOIUrl":null,"url":null,"abstract":"<p>Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"36 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation\",\"authors\":\"Jiehua Zhang, Liang Li, Chenggang Yan, Zhan Wang, Changliang Xu, Jiyong Zhang, Chuqiao Chen\",\"doi\":\"10.1145/3672397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3672397\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672397","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Learning Domain Invariant Features for Unsupervised Indoor Depth Estimation Adaptation
Predicting depth maps from monocular images has made an impressive performance in the past years. However, most depth estimation methods are trained with paired image-depth map data or multi-view images (e.g., stereo pair and monocular sequence), which suffer from expensive annotation costs and poor transferability. Although unsupervised domain adaptation methods are introduced to mitigate the reliance on annotated data, rare works focus on the unsupervised cross-scenario indoor monocular depth estimation. In this paper, we propose to study the generalization of depth estimation models across different indoor scenarios in an adversarial-based domain adaptation paradigm. Concretely, a domain discriminator is designed for discriminating the representation from source and target domains, while the feature extractor aims to confuse the domain discriminator by capturing domain-invariant features. Further, we reconstruct depth maps from latent representations with the supervision of labeled source data. As a result, the feature extractor learned features possess the merit of both domain-invariant and low source risk, and the depth estimator can deal with the domain shift between source and target domains. We conduct the cross-scenario and cross-dataset experiments on the ScanNet and NYU-Depth-v2 datasets to verify the effectiveness of our method and achieve impressive performance.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.