{"title":"卷积神经网络深度推理","authors":"Hu Tian, Bojin Zhuang, Yan Hua, A. Cai","doi":"10.1109/VCIP.2014.7051531","DOIUrl":null,"url":null,"abstract":"The goal of depth inference from a single image is to assign a depth to each pixel in the image according to the image content. In this paper, we propose a deep learning model for this task. This model consists of a convolutional neural network (CNN) with a linear regressor being as the last layer. The network is trained with raw RGB image patches cropped by a large window centered at each pixel of an image to extract feature representations. Then the depth map of a test image can be efficiently obtained by forward-passing the image through the trained model plus a simple up-sampling. Contrary to most previous methods based on graphical model and depth sampling, our method alleviates the needs for engineered features and for assumptions about semantic information of the scene. We achieve state-of-the-art results on Make 3D dataset, while keeping low computational time at the test time.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Depth inference with convolutional neural network\",\"authors\":\"Hu Tian, Bojin Zhuang, Yan Hua, A. Cai\",\"doi\":\"10.1109/VCIP.2014.7051531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of depth inference from a single image is to assign a depth to each pixel in the image according to the image content. In this paper, we propose a deep learning model for this task. This model consists of a convolutional neural network (CNN) with a linear regressor being as the last layer. The network is trained with raw RGB image patches cropped by a large window centered at each pixel of an image to extract feature representations. Then the depth map of a test image can be efficiently obtained by forward-passing the image through the trained model plus a simple up-sampling. Contrary to most previous methods based on graphical model and depth sampling, our method alleviates the needs for engineered features and for assumptions about semantic information of the scene. We achieve state-of-the-art results on Make 3D dataset, while keeping low computational time at the test time.\",\"PeriodicalId\":166978,\"journal\":{\"name\":\"2014 IEEE Visual Communications and Image Processing Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Visual Communications and Image Processing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP.2014.7051531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Visual Communications and Image Processing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP.2014.7051531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The goal of depth inference from a single image is to assign a depth to each pixel in the image according to the image content. In this paper, we propose a deep learning model for this task. This model consists of a convolutional neural network (CNN) with a linear regressor being as the last layer. The network is trained with raw RGB image patches cropped by a large window centered at each pixel of an image to extract feature representations. Then the depth map of a test image can be efficiently obtained by forward-passing the image through the trained model plus a simple up-sampling. Contrary to most previous methods based on graphical model and depth sampling, our method alleviates the needs for engineered features and for assumptions about semantic information of the scene. We achieve state-of-the-art results on Make 3D dataset, while keeping low computational time at the test time.