Additive Gaussian noise is widely used in generative adversarial networks (GANs). It is shown that the convergence speed is increased through the application of the additive Gaussian noise. However, the performance such as the visual quality of generated samples and semiclassification accuracy is not improved. This is partially due to the high uncertainty introduced by the additive noise. In this paper, we introduce multiplicative noise which has lower uncertainty under technical conditions, and it improves the performance of GANs. To demonstrate its practical use, two experiments including unsupervised human face generation and semi-classification tasks are conducted. The results show that it improves the state-of-art semi-classification accuracy on three benchmarks including CIFAR-10, SVHN and MNIST, as well as the visual quality and variety of generated samples on GANs with the additive Gaussian noise.
{"title":"Multiplicative Noise Channel in Generative Adversarial Networks","authors":"Xinhan Di, Pengqian Yu","doi":"10.1109/ICCVW.2017.141","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.141","url":null,"abstract":"Additive Gaussian noise is widely used in generative adversarial networks (GANs). It is shown that the convergence speed is increased through the application of the additive Gaussian noise. However, the performance such as the visual quality of generated samples and semiclassification accuracy is not improved. This is partially due to the high uncertainty introduced by the additive noise. In this paper, we introduce multiplicative noise which has lower uncertainty under technical conditions, and it improves the performance of GANs. To demonstrate its practical use, two experiments including unsupervised human face generation and semi-classification tasks are conducted. The results show that it improves the state-of-art semi-classification accuracy on three benchmarks including CIFAR-10, SVHN and MNIST, as well as the visual quality and variety of generated samples on GANs with the additive Gaussian noise.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133394151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel surface normal estimator is introduced using affine-invariant features extracted and tracked across multiple views. Normal estimation is robustified and integrated into our reconstruction pipeline that has increased accuracy compared to the State-of-the-Art. Parameters of the views and the obtained spatial model, including surface normals, are refined by a novel bundle adjustment-like numerical optimization. The process is an alternation with a novel robust view-dependent consistency check for surface normals, removing normals inconsistent with the multiple-view track. Our algorithms are quantitatively validated on the reverse engineering of geometrical elements such as planes, spheres, or cylinders. It is shown here that the accuracy of the estimated surface properties is appropriate for object detection. The pipeline is also tested on the reconstruction of man-made and free-form objects.
{"title":"Computer Vision Meets Geometric Modeling: Multi-view Reconstruction of Surface Points and Normals Using Affine Correspondences","authors":"Levente Hajder, Ivan Eichhardt","doi":"10.1109/ICCVW.2017.286","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.286","url":null,"abstract":"A novel surface normal estimator is introduced using affine-invariant features extracted and tracked across multiple views. Normal estimation is robustified and integrated into our reconstruction pipeline that has increased accuracy compared to the State-of-the-Art. Parameters of the views and the obtained spatial model, including surface normals, are refined by a novel bundle adjustment-like numerical optimization. The process is an alternation with a novel robust view-dependent consistency check for surface normals, removing normals inconsistent with the multiple-view track. Our algorithms are quantitatively validated on the reverse engineering of geometrical elements such as planes, spheres, or cylinders. It is shown here that the accuracy of the estimated surface properties is appropriate for object detection. The pipeline is also tested on the reconstruction of man-made and free-form objects.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132279517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features are efficiently represented. Here, we dissect a trained CNN [2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions of them to quantify color tuning properties of artificial neurons to provide a classification of the network population. We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), object-shapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).
{"title":"Color Representation in CNNs: Parallelisms with Biological Vision","authors":"Ivet Rafegas, M. Vanrell","doi":"10.1109/ICCVW.2017.318","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.318","url":null,"abstract":"Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features are efficiently represented. Here, we dissect a trained CNN [2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions of them to quantify color tuning properties of artificial neurons to provide a classification of the network population. We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), object-shapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"43 51","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133783742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tracking human sleeping postures over time provides critical information to biomedical research including studies on sleeping behaviors and bedsore prevention. In this paper, we introduce a vision-based tracking system for pervasive yet unobtrusive long-term monitoring of in-bed postures in different environments. Once trained, our system generates an in-bed posture tracking history (iPoTH) report by applying a hierarchical inference model on the top view videos collected from any regular off-the-shelf camera. Although being based on a supervised learning structure, our model is person-independent and can be trained off-line and applied to new users without additional training. Experiments were conducted in both a simulated hospital environment and a home-like setting. In the hospital setting, posture detection accuracy using several mannequins was up to 91.0%, while the test with actual human participants in a home-like setting showed an accuracy of 93.6%.
{"title":"A Vision-Based System for In-Bed Posture Tracking","authors":"Shuangjun Liu, S. Ostadabbas","doi":"10.1109/ICCVW.2017.163","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.163","url":null,"abstract":"Tracking human sleeping postures over time provides critical information to biomedical research including studies on sleeping behaviors and bedsore prevention. In this paper, we introduce a vision-based tracking system for pervasive yet unobtrusive long-term monitoring of in-bed postures in different environments. Once trained, our system generates an in-bed posture tracking history (iPoTH) report by applying a hierarchical inference model on the top view videos collected from any regular off-the-shelf camera. Although being based on a supervised learning structure, our model is person-independent and can be trained off-line and applied to new users without additional training. Experiments were conducted in both a simulated hospital environment and a home-like setting. In the hospital setting, posture detection accuracy using several mannequins was up to 91.0%, while the test with actual human participants in a home-like setting showed an accuracy of 93.6%.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124233440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Koester, R. Stiefelhagen, Maximilian Awiszus
Blind and partially sighted people have encountered numerous devices to improve their mobility and orientation, yet most still rely on traditional techniques, such as the white cane or a guide dog. In this paper, we consider improving the actual orientation process through the creation of routes that are better suited towards specific needs. More precisely, this work focuses on routing for blind and partially sighted people on a shoreline like level of detail, modeled after real world white cane usage. Our system is able to create such fine-grained routes through the extraction of routing features from openly available geolocation data, e.g., building facades and road crossings. More importantly, the generated routes provide a measurable safety benefit, as they reduce the number of unmarked pedestrian crossings and try to utilize much more accessible alternatives. Our evaluation shows that such a fine-grained routing can improve users' safety and improve their understanding of the environment lying ahead, especially the upcoming route and its impediments.
{"title":"Mind the Gap: Virtual Shorelines for Blind and Partially Sighted People","authors":"Daniel Koester, R. Stiefelhagen, Maximilian Awiszus","doi":"10.1109/ICCVW.2017.171","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.171","url":null,"abstract":"Blind and partially sighted people have encountered numerous devices to improve their mobility and orientation, yet most still rely on traditional techniques, such as the white cane or a guide dog. In this paper, we consider improving the actual orientation process through the creation of routes that are better suited towards specific needs. More precisely, this work focuses on routing for blind and partially sighted people on a shoreline like level of detail, modeled after real world white cane usage. Our system is able to create such fine-grained routes through the extraction of routing features from openly available geolocation data, e.g., building facades and road crossings. More importantly, the generated routes provide a measurable safety benefit, as they reduce the number of unmarked pedestrian crossings and try to utilize much more accessible alternatives. Our evaluation shows that such a fine-grained routing can improve users' safety and improve their understanding of the environment lying ahead, especially the upcoming route and its impediments.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116992472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Zafeiriou, Grigorios G. Chrysos, A. Roussos, Evangelos Ververas, Jiankang Deng, George Trigeorgis
Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured "in-the-wild" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking "in-the-wild".
{"title":"The 3D Menpo Facial Landmark Tracking Challenge","authors":"S. Zafeiriou, Grigorios G. Chrysos, A. Roussos, Evangelos Ververas, Jiankang Deng, George Trigeorgis","doi":"10.1109/ICCVW.2017.16","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.16","url":null,"abstract":"Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured \"in-the-wild\" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking \"in-the-wild\".","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125806894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High resolution images of the road surface can be obtained cheaply and quickly by driving a vehicle around the road network equipped with a camera oriented towards the road surface. If camera calibration information is available and accurate estimates of the camera pose can be made then the images can be stitched into an orthomosaic (i.e. a mosaiced image approximating an orthographic view) providing a virtual top down view of the road network. However, the vehicle capturing the images changes the scene: it casts a shadow onto the road surface that is sometimes visible in the captured images. This causes large artefacts in the stitched orthomosaic. In this paper, we propose a model-based solution to this problem. We capture a 3D model of the vehicle, transform it to a canonical pose and use it in conjunction with a model of sun geometry to predict shadow masks by ray casting. Shadow masks are precomputed, stored in a look up table and used to generate per-pixel weights for stitching. We integrate this approach into a pipeline for pose estimation and gradient domain stitching that we show is capable of producing shadow-free, high quality orthomosaics from uncontrolled, real world datasets.
{"title":"Eliminating the Observer Effect: Shadow Removal in Orthomosaics of the Road Network","authors":"S. Tanathong, W. Smith, Stephen Remde","doi":"10.1109/ICCVW.2017.40","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.40","url":null,"abstract":"High resolution images of the road surface can be obtained cheaply and quickly by driving a vehicle around the road network equipped with a camera oriented towards the road surface. If camera calibration information is available and accurate estimates of the camera pose can be made then the images can be stitched into an orthomosaic (i.e. a mosaiced image approximating an orthographic view) providing a virtual top down view of the road network. However, the vehicle capturing the images changes the scene: it casts a shadow onto the road surface that is sometimes visible in the captured images. This causes large artefacts in the stitched orthomosaic. In this paper, we propose a model-based solution to this problem. We capture a 3D model of the vehicle, transform it to a canonical pose and use it in conjunction with a model of sun geometry to predict shadow masks by ray casting. Shadow masks are precomputed, stored in a look up table and used to generate per-pixel weights for stitching. We integrate this approach into a pipeline for pose estimation and gradient domain stitching that we show is capable of producing shadow-free, high quality orthomosaics from uncontrolled, real world datasets.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128341336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.
{"title":"RGB-D Object Recognition Using Deep Convolutional Neural Networks","authors":"Saman Zia, Buket Yüksel, Deniz Yuret, Y. Yemez","doi":"10.1109/ICCVW.2017.109","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.109","url":null,"abstract":"We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128741854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Knyaz, O. Vygolov, V. Kniaz, Y. Vizilter, V. Gorbatsevich, T. Luhmann, N. Conen
Performing image matching in thermal images is challenging due to an absence of distinctive features and presence of thermal reflections. Still, in many applications, infrared imagery is an attractive solution for 3D object reconstruction that is robust against low light conditions. We present an image patch matching method based on deep learning. For image matching in the infrared range, we use codes generated by a convolutional auto-encoder. We evaluate the method in a full 3D object reconstruction pipeline that uses infrared imagery as an input. Image matches found using the proposed method are used for estimation of the camera pose. Dense 3D object reconstruction is performed using semi-global block matching. We evaluate on a dataset with real and synthetic images to show that our method outperforms existing image matching methods on the infrared imagery. We also evaluate the geometry of generated 3D models to demonstrate the increased reconstruction accuracy.
{"title":"Deep Learning of Convolutional Auto-Encoder for Image Matching and 3D Object Reconstruction in the Infrared Range","authors":"V. Knyaz, O. Vygolov, V. Kniaz, Y. Vizilter, V. Gorbatsevich, T. Luhmann, N. Conen","doi":"10.1109/ICCVW.2017.252","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.252","url":null,"abstract":"Performing image matching in thermal images is challenging due to an absence of distinctive features and presence of thermal reflections. Still, in many applications, infrared imagery is an attractive solution for 3D object reconstruction that is robust against low light conditions. We present an image patch matching method based on deep learning. For image matching in the infrared range, we use codes generated by a convolutional auto-encoder. We evaluate the method in a full 3D object reconstruction pipeline that uses infrared imagery as an input. Image matches found using the proposed method are used for estimation of the camera pose. Dense 3D object reconstruction is performed using semi-global block matching. We evaluate on a dataset with real and synthetic images to show that our method outperforms existing image matching methods on the infrared imagery. We also evaluate the geometry of generated 3D models to demonstrate the increased reconstruction accuracy.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feature representation/learning is an essential step for many computer vision tasks (like image classification) and is broadly categorized as 1) deep feature representation; 2) shallow feature representation. With the development of deep neural networks, many deep feature representation methods have been proposed and obtained many remarkable results. However, they are limited to real-world applications due to the high demand for storage space and computation ability. In our work, we focus on shallow feature representation (like PCANet) as these algorithms require less storage space and computational resources. In this paper, we have proposed a Compact Feature Representation algorithm (CFR-ELM) by using Extreme Learning Machine (ELM) under a shallow network framework. CFR-ELM consists of compact feature learning module and a post-processing module. Each feature learning module in CRF-ELM performs the following operations: 1) patch-based mean removal; 2) ELM auto-encoder (ELM-AE) to learn features; 3) Max pooling to make the features more compact. Post-processing module is inserted after the feature learning module and simplifies the features learn by the feature learning modules by hashing and block-wise histogram. We have tested CFR-ELM on four typical image classification databases, and the results demonstrate that our method outperforms the state-of-the-art methods.
{"title":"Compact Feature Representation for Image Classification Using ELMs","authors":"Dongshun Cui, Guanghao Zhang, Wei Han","doi":"10.1109/ICCVW.2017.124","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.124","url":null,"abstract":"Feature representation/learning is an essential step for many computer vision tasks (like image classification) and is broadly categorized as 1) deep feature representation; 2) shallow feature representation. With the development of deep neural networks, many deep feature representation methods have been proposed and obtained many remarkable results. However, they are limited to real-world applications due to the high demand for storage space and computation ability. In our work, we focus on shallow feature representation (like PCANet) as these algorithms require less storage space and computational resources. In this paper, we have proposed a Compact Feature Representation algorithm (CFR-ELM) by using Extreme Learning Machine (ELM) under a shallow network framework. CFR-ELM consists of compact feature learning module and a post-processing module. Each feature learning module in CRF-ELM performs the following operations: 1) patch-based mean removal; 2) ELM auto-encoder (ELM-AE) to learn features; 3) Max pooling to make the features more compact. Post-processing module is inserted after the feature learning module and simplifies the features learn by the feature learning modules by hashing and block-wise histogram. We have tested CFR-ELM on four typical image classification databases, and the results demonstrate that our method outperforms the state-of-the-art methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130542051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}