{"title":"基于区域自关注的有序深度分类","authors":"Minh-Hieu Phan, S. L. Phung, A. Bouzerdoum","doi":"10.1109/ICPR48806.2021.9412477","DOIUrl":null,"url":null,"abstract":"Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"84 1","pages":"3620-3627"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Ordinal Depth Classification Using Region-based Self-attention\",\"authors\":\"Minh-Hieu Phan, S. L. Phung, A. Bouzerdoum\",\"doi\":\"10.1109/ICPR48806.2021.9412477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.\",\"PeriodicalId\":6783,\"journal\":{\"name\":\"2020 25th International Conference on Pattern Recognition (ICPR)\",\"volume\":\"84 1\",\"pages\":\"3620-3627\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 25th International Conference on Pattern Recognition (ICPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPR48806.2021.9412477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th International Conference on Pattern Recognition (ICPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR48806.2021.9412477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ordinal Depth Classification Using Region-based Self-attention
Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.