首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
Multiplicative Noise Channel in Generative Adversarial Networks 生成对抗网络中的乘性噪声信道
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.141
Xinhan Di, Pengqian Yu
Additive Gaussian noise is widely used in generative adversarial networks (GANs). It is shown that the convergence speed is increased through the application of the additive Gaussian noise. However, the performance such as the visual quality of generated samples and semiclassification accuracy is not improved. This is partially due to the high uncertainty introduced by the additive noise. In this paper, we introduce multiplicative noise which has lower uncertainty under technical conditions, and it improves the performance of GANs. To demonstrate its practical use, two experiments including unsupervised human face generation and semi-classification tasks are conducted. The results show that it improves the state-of-art semi-classification accuracy on three benchmarks including CIFAR-10, SVHN and MNIST, as well as the visual quality and variety of generated samples on GANs with the additive Gaussian noise.
加性高斯噪声被广泛应用于生成对抗网络(GANs)中。结果表明,加性高斯噪声的加入提高了收敛速度。然而,生成样本的视觉质量和半分类精度等性能并没有得到提高。这部分是由于加性噪声带来的高不确定性。本文引入了技术条件下不确定性较低的乘性噪声,提高了gan的性能。为了验证其实际应用,进行了无监督人脸生成和半分类任务两个实验。结果表明,该方法在CIFAR-10、SVHN和MNIST三个基准上提高了半分类精度,并且在加性高斯噪声的gan上提高了生成样本的视觉质量和多样性。
{"title":"Multiplicative Noise Channel in Generative Adversarial Networks","authors":"Xinhan Di, Pengqian Yu","doi":"10.1109/ICCVW.2017.141","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.141","url":null,"abstract":"Additive Gaussian noise is widely used in generative adversarial networks (GANs). It is shown that the convergence speed is increased through the application of the additive Gaussian noise. However, the performance such as the visual quality of generated samples and semiclassification accuracy is not improved. This is partially due to the high uncertainty introduced by the additive noise. In this paper, we introduce multiplicative noise which has lower uncertainty under technical conditions, and it improves the performance of GANs. To demonstrate its practical use, two experiments including unsupervised human face generation and semi-classification tasks are conducted. The results show that it improves the state-of-art semi-classification accuracy on three benchmarks including CIFAR-10, SVHN and MNIST, as well as the visual quality and variety of generated samples on GANs with the additive Gaussian noise.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133394151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Computer Vision Meets Geometric Modeling: Multi-view Reconstruction of Surface Points and Normals Using Affine Correspondences 计算机视觉满足几何建模:使用仿射对应的曲面点和法线的多视图重建
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.286
Levente Hajder, Ivan Eichhardt
A novel surface normal estimator is introduced using affine-invariant features extracted and tracked across multiple views. Normal estimation is robustified and integrated into our reconstruction pipeline that has increased accuracy compared to the State-of-the-Art. Parameters of the views and the obtained spatial model, including surface normals, are refined by a novel bundle adjustment-like numerical optimization. The process is an alternation with a novel robust view-dependent consistency check for surface normals, removing normals inconsistent with the multiple-view track. Our algorithms are quantitatively validated on the reverse engineering of geometrical elements such as planes, spheres, or cylinders. It is shown here that the accuracy of the estimated surface properties is appropriate for object detection. The pipeline is also tested on the reconstruction of man-made and free-form objects.
提出了一种基于仿射不变特征提取和跟踪的曲面法向估计方法。正态估计是鲁棒的,并集成到我们的重建管道中,与最先进的方法相比,提高了准确性。通过一种新的类似束调整的数值优化方法,对视图参数和得到的空间模型(包括表面法线)进行细化。该过程是一种新的鲁棒视图依赖的表面法线一致性检查的交替,去除与多视图轨迹不一致的法线。我们的算法在平面、球体或圆柱体等几何元素的逆向工程中得到了定量验证。这表明,估计的表面性质的精度是适当的目标检测。该管道还在人造和自由形状物体的重建上进行了测试。
{"title":"Computer Vision Meets Geometric Modeling: Multi-view Reconstruction of Surface Points and Normals Using Affine Correspondences","authors":"Levente Hajder, Ivan Eichhardt","doi":"10.1109/ICCVW.2017.286","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.286","url":null,"abstract":"A novel surface normal estimator is introduced using affine-invariant features extracted and tracked across multiple views. Normal estimation is robustified and integrated into our reconstruction pipeline that has increased accuracy compared to the State-of-the-Art. Parameters of the views and the obtained spatial model, including surface normals, are refined by a novel bundle adjustment-like numerical optimization. The process is an alternation with a novel robust view-dependent consistency check for surface normals, removing normals inconsistent with the multiple-view track. Our algorithms are quantitatively validated on the reverse engineering of geometrical elements such as planes, spheres, or cylinders. It is shown here that the accuracy of the estimated surface properties is appropriate for object detection. The pipeline is also tested on the reconstruction of man-made and free-form objects.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132279517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Color Representation in CNNs: Parallelisms with Biological Vision cnn的颜色表示:与生物视觉的并行性
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.318
Ivet Rafegas, M. Vanrell
Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features are efficiently represented. Here, we dissect a trained CNN [2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions of them to quantify color tuning properties of artificial neurons to provide a classification of the network population. We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), object-shapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).
卷积神经网络(cnn)训练用于对象识别任务呈现接近灵长类视觉系统的表征能力[1]。这为探索如何有效地表示图像特征提供了一个计算框架。在这里,我们剖析了一个训练好的CNN[2]来研究颜色是如何表示的。我们使用生理学中使用的经典方法,即测量单个神经元对特定特征的选择性指数。我们使用ImageNet Dataset[20]图像和它们的合成版本来量化人工神经元的颜色调谐特性,以提供网络种群的分类。我们总结了三个主要的颜色表示层次,显示了与生物视觉系统的一些相似之处:(a)在圆形色调空间中进行分解,以表示第一层以外的更宽色调采样的单一颜色区域(V2), (b)在早期阶段出现对手低维空间,以表示颜色边缘(V1);(c)在更深层次(V4或IT)中,表示对象部件(例如汽车的车轮)、对象形状(例如面孔)或对象周围配置(例如围绕对象的蓝天)的颜色和形状模式之间存在强烈的纠缠。
{"title":"Color Representation in CNNs: Parallelisms with Biological Vision","authors":"Ivet Rafegas, M. Vanrell","doi":"10.1109/ICCVW.2017.318","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.318","url":null,"abstract":"Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features are efficiently represented. Here, we dissect a trained CNN [2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions of them to quantify color tuning properties of artificial neurons to provide a classification of the network population. We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), object-shapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"43 51","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133783742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Vision-Based System for In-Bed Posture Tracking 一种基于视觉的床上姿势跟踪系统
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.163
Shuangjun Liu, S. Ostadabbas
Tracking human sleeping postures over time provides critical information to biomedical research including studies on sleeping behaviors and bedsore prevention. In this paper, we introduce a vision-based tracking system for pervasive yet unobtrusive long-term monitoring of in-bed postures in different environments. Once trained, our system generates an in-bed posture tracking history (iPoTH) report by applying a hierarchical inference model on the top view videos collected from any regular off-the-shelf camera. Although being based on a supervised learning structure, our model is person-independent and can be trained off-line and applied to new users without additional training. Experiments were conducted in both a simulated hospital environment and a home-like setting. In the hospital setting, posture detection accuracy using several mannequins was up to 91.0%, while the test with actual human participants in a home-like setting showed an accuracy of 93.6%.
随着时间的推移,跟踪人类的睡眠姿势为生物医学研究提供了重要信息,包括睡眠行为和褥疮预防的研究。在本文中,我们介绍了一种基于视觉的跟踪系统,用于在不同环境中对床上姿势进行普遍而不显眼的长期监测。经过训练后,我们的系统通过对从任何普通相机收集的俯视图视频应用分层推理模型,生成床上姿势跟踪历史(iPoTH)报告。虽然基于监督学习结构,但我们的模型是独立于个人的,可以离线训练并应用于新用户,而无需额外的训练。实验在模拟医院环境和家庭环境中进行。在医院环境中,使用几个人体模型的姿势检测准确率高达91.0%,而在类似家庭环境中与真人参与者进行的测试显示准确率为93.6%。
{"title":"A Vision-Based System for In-Bed Posture Tracking","authors":"Shuangjun Liu, S. Ostadabbas","doi":"10.1109/ICCVW.2017.163","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.163","url":null,"abstract":"Tracking human sleeping postures over time provides critical information to biomedical research including studies on sleeping behaviors and bedsore prevention. In this paper, we introduce a vision-based tracking system for pervasive yet unobtrusive long-term monitoring of in-bed postures in different environments. Once trained, our system generates an in-bed posture tracking history (iPoTH) report by applying a hierarchical inference model on the top view videos collected from any regular off-the-shelf camera. Although being based on a supervised learning structure, our model is person-independent and can be trained off-line and applied to new users without additional training. Experiments were conducted in both a simulated hospital environment and a home-like setting. In the hospital setting, posture detection accuracy using several mannequins was up to 91.0%, while the test with actual human participants in a home-like setting showed an accuracy of 93.6%.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124233440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Mind the Gap: Virtual Shorelines for Blind and Partially Sighted People 注意差距:盲人和弱视人群的虚拟海岸线
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.171
Daniel Koester, R. Stiefelhagen, Maximilian Awiszus
Blind and partially sighted people have encountered numerous devices to improve their mobility and orientation, yet most still rely on traditional techniques, such as the white cane or a guide dog. In this paper, we consider improving the actual orientation process through the creation of routes that are better suited towards specific needs. More precisely, this work focuses on routing for blind and partially sighted people on a shoreline like level of detail, modeled after real world white cane usage. Our system is able to create such fine-grained routes through the extraction of routing features from openly available geolocation data, e.g., building facades and road crossings. More importantly, the generated routes provide a measurable safety benefit, as they reduce the number of unmarked pedestrian crossings and try to utilize much more accessible alternatives. Our evaluation shows that such a fine-grained routing can improve users' safety and improve their understanding of the environment lying ahead, especially the upcoming route and its impediments.
盲人和弱视的人已经遇到了许多设备来改善他们的行动和方向,但大多数人仍然依靠传统的技术,如白手杖或导盲犬。在本文中,我们考虑通过创建更适合特定需求的路线来改进实际定位过程。更准确地说,这项工作的重点是在海岸线上为盲人和视力不佳的人提供类似细节的路线,模仿现实世界中的白手杖使用情况。我们的系统能够通过从公开可用的地理位置数据中提取路线特征来创建这种细粒度的路线,例如,建筑立面和十字路口。更重要的是,生成的路线提供了可衡量的安全效益,因为它们减少了未标记的人行横道的数量,并试图利用更容易到达的替代方案。我们的评估表明,这种细粒度路由可以提高用户的安全性,并提高他们对前方环境的理解,特别是即将到来的路线及其障碍。
{"title":"Mind the Gap: Virtual Shorelines for Blind and Partially Sighted People","authors":"Daniel Koester, R. Stiefelhagen, Maximilian Awiszus","doi":"10.1109/ICCVW.2017.171","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.171","url":null,"abstract":"Blind and partially sighted people have encountered numerous devices to improve their mobility and orientation, yet most still rely on traditional techniques, such as the white cane or a guide dog. In this paper, we consider improving the actual orientation process through the creation of routes that are better suited towards specific needs. More precisely, this work focuses on routing for blind and partially sighted people on a shoreline like level of detail, modeled after real world white cane usage. Our system is able to create such fine-grained routes through the extraction of routing features from openly available geolocation data, e.g., building facades and road crossings. More importantly, the generated routes provide a measurable safety benefit, as they reduce the number of unmarked pedestrian crossings and try to utilize much more accessible alternatives. Our evaluation shows that such a fine-grained routing can improve users' safety and improve their understanding of the environment lying ahead, especially the upcoming route and its impediments.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116992472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The 3D Menpo Facial Landmark Tracking Challenge 3D门珀面部地标追踪挑战
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.16
S. Zafeiriou, Grigorios G. Chrysos, A. Roussos, Evangelos Ververas, Jiankang Deng, George Trigeorgis
Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured "in-the-wild" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking "in-the-wild".
最近,可变形人脸对齐是在强度图像中定位一组二维稀疏地标的同义词。目前,判别训练的深度卷积神经网络(Deep Convolutional Neural Networks, DCNNs)是人脸识别的前沿技术。DCNNs利用了过去几年出现的大量高质量注释。然而,所提供的2D注释很少捕捉到面部的3D结构(这在面部边界中尤其明显)。也就是说,注释既不提供深度的估计,也不对应于3D面部结构的2D投影。本文总结了我们在以下方面所做的努力:(a)一个非常大的数据库,适合用于在“野外”捕获的图像中训练3D人脸对齐算法;(b)训练和评估3D人脸地标跟踪的新方法。最后,我们报告了“野外”3D人脸跟踪的第一个挑战的结果。
{"title":"The 3D Menpo Facial Landmark Tracking Challenge","authors":"S. Zafeiriou, Grigorios G. Chrysos, A. Roussos, Evangelos Ververas, Jiankang Deng, George Trigeorgis","doi":"10.1109/ICCVW.2017.16","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.16","url":null,"abstract":"Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured \"in-the-wild\" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking \"in-the-wild\".","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125806894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Eliminating the Observer Effect: Shadow Removal in Orthomosaics of the Road Network 消除观察者效应:道路网络正形图中的阴影去除
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.40
S. Tanathong, W. Smith, Stephen Remde
High resolution images of the road surface can be obtained cheaply and quickly by driving a vehicle around the road network equipped with a camera oriented towards the road surface. If camera calibration information is available and accurate estimates of the camera pose can be made then the images can be stitched into an orthomosaic (i.e. a mosaiced image approximating an orthographic view) providing a virtual top down view of the road network. However, the vehicle capturing the images changes the scene: it casts a shadow onto the road surface that is sometimes visible in the captured images. This causes large artefacts in the stitched orthomosaic. In this paper, we propose a model-based solution to this problem. We capture a 3D model of the vehicle, transform it to a canonical pose and use it in conjunction with a model of sun geometry to predict shadow masks by ray casting. Shadow masks are precomputed, stored in a look up table and used to generate per-pixel weights for stitching. We integrate this approach into a pipeline for pose estimation and gradient domain stitching that we show is capable of producing shadow-free, high quality orthomosaics from uncontrolled, real world datasets.
通过在路网周围行驶装有面向路面的摄像头的车辆,可以廉价、快速地获得高分辨率的路面图像。如果相机校准信息是可用的,并且可以对相机姿势进行准确的估计,那么这些图像可以拼接成正交图像(即近似正交视图的拼接图像),从而提供虚拟的自上而下的道路网络视图。然而,拍摄图像的车辆改变了场景:它在路面上投下阴影,有时在拍摄的图像中是可见的。这将导致缝制的正马赛克中出现较大的伪影。在本文中,我们提出了一种基于模型的解决方案。我们捕获车辆的3D模型,将其转换为规范姿势,并将其与太阳几何模型结合使用,通过光线投射来预测阴影面具。阴影蒙版是预先计算的,存储在查找表中,用于生成拼接的每像素权重。我们将这种方法集成到姿态估计和梯度域拼接的管道中,我们展示了能够从不受控制的真实世界数据集产生无阴影、高质量的正形图。
{"title":"Eliminating the Observer Effect: Shadow Removal in Orthomosaics of the Road Network","authors":"S. Tanathong, W. Smith, Stephen Remde","doi":"10.1109/ICCVW.2017.40","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.40","url":null,"abstract":"High resolution images of the road surface can be obtained cheaply and quickly by driving a vehicle around the road network equipped with a camera oriented towards the road surface. If camera calibration information is available and accurate estimates of the camera pose can be made then the images can be stitched into an orthomosaic (i.e. a mosaiced image approximating an orthographic view) providing a virtual top down view of the road network. However, the vehicle capturing the images changes the scene: it casts a shadow onto the road surface that is sometimes visible in the captured images. This causes large artefacts in the stitched orthomosaic. In this paper, we propose a model-based solution to this problem. We capture a 3D model of the vehicle, transform it to a canonical pose and use it in conjunction with a model of sun geometry to predict shadow masks by ray casting. Shadow masks are precomputed, stored in a look up table and used to generate per-pixel weights for stitching. We integrate this approach into a pipeline for pose estimation and gradient domain stitching that we show is capable of producing shadow-free, high quality orthomosaics from uncontrolled, real world datasets.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128341336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RGB-D Object Recognition Using Deep Convolutional Neural Networks 基于深度卷积神经网络的RGB-D对象识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.109
Saman Zia, Buket Yüksel, Deniz Yuret, Y. Yemez
We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.
我们使用深度卷积神经网络(cnn)解决了RGB-D图像的目标识别问题。我们提倡使用3D cnn充分挖掘深度图像中的3D空间信息,以及使用预训练的2D cnn从RGB-D图像中学习特征。与RGB数据相比,目前还没有包含深度信息的大规模数据集。因此,从二维源数据迁移学习是训练深度三维cnn的关键。为此,我们提出了一个混合的2D/3D卷积神经网络,可以用预训练的2D cnn初始化,然后可以在相对较小的RGB-D数据集上进行训练。我们在华盛顿数据集上进行实验,涉及小型家用物品的RGB-D图像。我们的实验表明,从这种混合结构中学习到的特征,当与从深度-only和RGB-only架构中学习到的特征融合时,在RGB-D类别识别上优于目前的技术水平。
{"title":"RGB-D Object Recognition Using Deep Convolutional Neural Networks","authors":"Saman Zia, Buket Yüksel, Deniz Yuret, Y. Yemez","doi":"10.1109/ICCVW.2017.109","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.109","url":null,"abstract":"We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128741854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Deep Learning of Convolutional Auto-Encoder for Image Matching and 3D Object Reconstruction in the Infrared Range 卷积自编码器的深度学习在红外范围内的图像匹配和三维物体重建
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.252
V. Knyaz, O. Vygolov, V. Kniaz, Y. Vizilter, V. Gorbatsevich, T. Luhmann, N. Conen
Performing image matching in thermal images is challenging due to an absence of distinctive features and presence of thermal reflections. Still, in many applications, infrared imagery is an attractive solution for 3D object reconstruction that is robust against low light conditions. We present an image patch matching method based on deep learning. For image matching in the infrared range, we use codes generated by a convolutional auto-encoder. We evaluate the method in a full 3D object reconstruction pipeline that uses infrared imagery as an input. Image matches found using the proposed method are used for estimation of the camera pose. Dense 3D object reconstruction is performed using semi-global block matching. We evaluate on a dataset with real and synthetic images to show that our method outperforms existing image matching methods on the infrared imagery. We also evaluate the geometry of generated 3D models to demonstrate the increased reconstruction accuracy.
由于缺乏鲜明的特征和热反射的存在,在热图像中进行图像匹配是具有挑战性的。尽管如此,在许多应用中,红外图像是3D物体重建的一种有吸引力的解决方案,它在弱光条件下具有鲁棒性。提出了一种基于深度学习的图像补丁匹配方法。对于红外范围内的图像匹配,我们使用卷积自编码器生成的代码。我们在使用红外图像作为输入的完整3D物体重建管道中评估该方法。使用该方法找到的图像匹配用于估计相机姿态。采用半全局块匹配实现密集三维物体重建。我们在真实图像和合成图像的数据集上进行了评估,表明我们的方法在红外图像上优于现有的图像匹配方法。我们还评估了生成的3D模型的几何形状,以证明增加的重建精度。
{"title":"Deep Learning of Convolutional Auto-Encoder for Image Matching and 3D Object Reconstruction in the Infrared Range","authors":"V. Knyaz, O. Vygolov, V. Kniaz, Y. Vizilter, V. Gorbatsevich, T. Luhmann, N. Conen","doi":"10.1109/ICCVW.2017.252","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.252","url":null,"abstract":"Performing image matching in thermal images is challenging due to an absence of distinctive features and presence of thermal reflections. Still, in many applications, infrared imagery is an attractive solution for 3D object reconstruction that is robust against low light conditions. We present an image patch matching method based on deep learning. For image matching in the infrared range, we use codes generated by a convolutional auto-encoder. We evaluate the method in a full 3D object reconstruction pipeline that uses infrared imagery as an input. Image matches found using the proposed method are used for estimation of the camera pose. Dense 3D object reconstruction is performed using semi-global block matching. We evaluate on a dataset with real and synthetic images to show that our method outperforms existing image matching methods on the infrared imagery. We also evaluate the geometry of generated 3D models to demonstrate the increased reconstruction accuracy.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Compact Feature Representation for Image Classification Using ELMs 基于elm的图像分类压缩特征表示
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.124
Dongshun Cui, Guanghao Zhang, Wei Han
Feature representation/learning is an essential step for many computer vision tasks (like image classification) and is broadly categorized as 1) deep feature representation; 2) shallow feature representation. With the development of deep neural networks, many deep feature representation methods have been proposed and obtained many remarkable results. However, they are limited to real-world applications due to the high demand for storage space and computation ability. In our work, we focus on shallow feature representation (like PCANet) as these algorithms require less storage space and computational resources. In this paper, we have proposed a Compact Feature Representation algorithm (CFR-ELM) by using Extreme Learning Machine (ELM) under a shallow network framework. CFR-ELM consists of compact feature learning module and a post-processing module. Each feature learning module in CRF-ELM performs the following operations: 1) patch-based mean removal; 2) ELM auto-encoder (ELM-AE) to learn features; 3) Max pooling to make the features more compact. Post-processing module is inserted after the feature learning module and simplifies the features learn by the feature learning modules by hashing and block-wise histogram. We have tested CFR-ELM on four typical image classification databases, and the results demonstrate that our method outperforms the state-of-the-art methods.
特征表示/学习是许多计算机视觉任务(如图像分类)的重要步骤,大致分为:1)深度特征表示;2)浅层特征表示。随着深度神经网络的发展,人们提出了许多深度特征表示方法,并取得了许多显著的成果。然而,由于对存储空间和计算能力的高要求,它们仅限于实际应用。在我们的工作中,我们专注于浅特征表示(如PCANet),因为这些算法需要更少的存储空间和计算资源。本文在浅层网络框架下,利用极限学习机提出了一种紧凑特征表示算法(CFR-ELM)。CFR-ELM由紧凑的特征学习模块和后处理模块组成。CRF-ELM中的每个特征学习模块执行以下操作:1)基于patch的均值去除;2) ELM自编码器(ELM- ae)学习特征;3)最大池化,使特征更紧凑。后处理模块插入特征学习模块之后,通过散列和分块直方图的方式简化特征学习模块学习到的特征。我们在四个典型的图像分类数据库上对CFR-ELM进行了测试,结果表明我们的方法优于最先进的方法。
{"title":"Compact Feature Representation for Image Classification Using ELMs","authors":"Dongshun Cui, Guanghao Zhang, Wei Han","doi":"10.1109/ICCVW.2017.124","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.124","url":null,"abstract":"Feature representation/learning is an essential step for many computer vision tasks (like image classification) and is broadly categorized as 1) deep feature representation; 2) shallow feature representation. With the development of deep neural networks, many deep feature representation methods have been proposed and obtained many remarkable results. However, they are limited to real-world applications due to the high demand for storage space and computation ability. In our work, we focus on shallow feature representation (like PCANet) as these algorithms require less storage space and computational resources. In this paper, we have proposed a Compact Feature Representation algorithm (CFR-ELM) by using Extreme Learning Machine (ELM) under a shallow network framework. CFR-ELM consists of compact feature learning module and a post-processing module. Each feature learning module in CRF-ELM performs the following operations: 1) patch-based mean removal; 2) ELM auto-encoder (ELM-AE) to learn features; 3) Max pooling to make the features more compact. Post-processing module is inserted after the feature learning module and simplifies the features learn by the feature learning modules by hashing and block-wise histogram. We have tested CFR-ELM on four typical image classification databases, and the results demonstrate that our method outperforms the state-of-the-art methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130542051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1