S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu
This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.
{"title":"Computer Vision for the Visually Impaired: the Sound of Vision System","authors":"S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu","doi":"10.1109/ICCVW.2017.175","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.175","url":null,"abstract":"This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133349540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham
The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.
{"title":"Dynamic Mode Decomposition for Background Modeling","authors":"Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham","doi":"10.1109/ICCVW.2017.220","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.220","url":null,"abstract":"The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.
{"title":"PVNN: A Neural Network Library for Photometric Vision","authors":"Ye Yu, W. Smith","doi":"10.1109/ICCVW.2017.69","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.69","url":null,"abstract":"In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig
Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.
{"title":"Siamese Networks for Chromosome Classification","authors":"Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig","doi":"10.1109/ICCVW.2017.17","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.17","url":null,"abstract":"Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125658186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.
{"title":"Spatial Attention Improves Object Localization: A Biologically Plausible Neuro-Computational Model for Use in Virtual Reality","authors":"A. Jamalian, Julia Bergelt, H. Dinkelbach","doi":"10.1109/ICCVW.2017.320","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.320","url":null,"abstract":"Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132555873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou
The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1
{"title":"ViTS: Video Tagging System from Massive Web Multimedia Collections","authors":"Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou","doi":"10.1109/ICCVW.2017.48","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.48","url":null,"abstract":"The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.
{"title":"A Biophysical 3D Morphable Model of Face Appearance","authors":"S. Alotaibi, W. Smith","doi":"10.1109/ICCVW.2017.102","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.102","url":null,"abstract":"Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent progress of deep image classification models provides a large potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUS. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized PASCAL images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a DenseNet-based ladder-style architecture which is able to deliver high modelling power with very lean representations at the original resolution. The resulting fully convolutional models have few parameters, allow training at megapixel resolution on commodity hardware and display fair semantic segmentation performance even without ImageNet pre-training. We present experiments on Cityscapes and Pascal VOC 2012 datasets and report competitive results.
{"title":"Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images","authors":"Josip Krapac, Ivan Kreso, Sinisa Segvic","doi":"10.1109/ICCVW.2017.37","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.37","url":null,"abstract":"Recent progress of deep image classification models provides a large potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUS. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized PASCAL images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a DenseNet-based ladder-style architecture which is able to deliver high modelling power with very lean representations at the original resolution. The resulting fully convolutional models have few parameters, allow training at megapixel resolution on commodity hardware and display fair semantic segmentation performance even without ImageNet pre-training. We present experiments on Cityscapes and Pascal VOC 2012 datasets and report competitive results.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133897071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao
Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.
{"title":"Double-Task Deep Q-Learning with Multiple Views","authors":"Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao","doi":"10.1109/ICCVW.2017.128","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.128","url":null,"abstract":"Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"239 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.
{"title":"Margin Based Semi-Supervised Elastic Embedding for Face Image Analysis","authors":"F. Dornaika, Y. E. Traboulsi","doi":"10.1109/ICCVW.2017.156","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.156","url":null,"abstract":"This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116355711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}