S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu
This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.
{"title":"Computer Vision for the Visually Impaired: the Sound of Vision System","authors":"S. Caraiman, A. Morar, Mateusz Owczarek, A. Burlacu, D. Rzeszotarski, N. Botezatu, P. Herghelegiu, F. Moldoveanu, P. Strumiłło, A. Moldoveanu","doi":"10.1109/ICCVW.2017.175","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.175","url":null,"abstract":"This paper presents a computer vision based sensory substitution device for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for this system is to ensure pervasiveness, i.e., to be usable in any indoor or outdoor environments and in any illumination conditions. This work reveals both the hardware (3D acquisition system) and software (3D processing pipeline) used for developing this sensory substitution device and provides insight on its exploitation in various scenarios. Preliminary experiments with blind users revealed good usability results and provided valuable feedback for system improvement.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133349540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham
The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.
{"title":"Dynamic Mode Decomposition for Background Modeling","authors":"Seth D. Pendergrass, S. Brunton, J. Kutz, N. Benjamin Erichson, T. Askham","doi":"10.1109/ICCVW.2017.220","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.220","url":null,"abstract":"The Dynamic Mode Decomposition (DMD) is a spatiotemporal matrix decomposition method capable of background modeling in video streams. DMD is a regression technique that integrates Fourier transforms and singular value decomposition. Innovations in compressed sensing allow for a scalable and rapid decomposition of video streams that scales with the intrinsic rank of the matrix, rather than the size of the actual video. Our results show that the quality of the resulting background model is competitive, quantified by the F-measure, recall and precision. A GPU (graphics processing unit) accelerated implementation is also possible allowing the algorithm to operate efficiently on streaming data. In addition, it is possible to leverage the native compressed format of many data streams, such as HD video and computational physics codes that are represented sparsely in the Fourier domain, to massively reduce data transfer from CPU to GPU and to enable sparse matrix multiplications.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.
{"title":"PVNN: A Neural Network Library for Photometric Vision","authors":"Ye Yu, W. Smith","doi":"10.1109/ICCVW.2017.69","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.69","url":null,"abstract":"In this paper we show how a differentiable, physics-based renderer suitable for photometric vision tasks can be implemented as layers in a deep neural network. The layers include geometric operations for representation transformations, reflectance evaluations with arbitrary numbers of light sources and statistical bidirectional reflectance distribution function (BRDF) models. We make an implementation of these layers available as a neural network library (PVNN) for Theano. The layers can be incorporated into any neural network architecture, allowing parts of the photometric image formation process to be explicitly modelled in a network that is trained end to end via backpropagation. As an exemplar application, we show how to train a network with encoder-decoder architecture that learns to estimate BRDF parameters from a single image in an unsupervised manner.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133109687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a new type of approach for estimating surface height from polarimetric data, i.e. a sequence of images in which a linear polarising filter is rotated in front of a camera. In contrast to all previous shape-from-polarisation methods, we do not first transform the observed data into a polarisation image. Instead, we minimise the sum of squared residuals between predicted and observed intensities over all pixels and polariser angles. This is a nonlinear least squares optimisation problem in which the unknown is the surface height. The forward prediction is a series of transformations for which we provide analytical derivatives allowing the overall problem to be efficiently optimised using Gauss-Newton type methods with an analytical Jacobian matrix. The method is very general and can incorporate any (differentiable) illumination, reflectance or polarisation model. We also propose a variant of the method which uses image ratios to remove dependence on illumination and albedo. We demonstrate our methods on glossy objects, including with albedo variations, and provide comparison to a state of the art approach.
{"title":"Shape-from-Polarisation: A Nonlinear Least Squares Approach","authors":"Ye Yu, Dizhong Zhu, W. Smith","doi":"10.1109/ICCVW.2017.350","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.350","url":null,"abstract":"In this paper we present a new type of approach for estimating surface height from polarimetric data, i.e. a sequence of images in which a linear polarising filter is rotated in front of a camera. In contrast to all previous shape-from-polarisation methods, we do not first transform the observed data into a polarisation image. Instead, we minimise the sum of squared residuals between predicted and observed intensities over all pixels and polariser angles. This is a nonlinear least squares optimisation problem in which the unknown is the surface height. The forward prediction is a series of transformations for which we provide analytical derivatives allowing the overall problem to be efficiently optimised using Gauss-Newton type methods with an analytical Jacobian matrix. The method is very general and can incorporate any (differentiable) illumination, reflectance or polarisation model. We also propose a variant of the method which uses image ratios to remove dependence on illumination and albedo. We demonstrate our methods on glossy objects, including with albedo variations, and provide comparison to a state of the art approach.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"159 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123080184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Csurka, Fabien Baradel, Boris Chidlovskii, S. Clinchant
Domain Adaptation (DA) exploits labeled data and models from similar domains in order to alleviate the annotation burden when learning a model in a new domain. Our contribution to the field is three-fold. First, we propose a new dataset, LandMarkDA, to study the adaptation between landmark place recognition models trained with different artistic image styles, such as photos, paintings and drawings. The new LandMarkDA proposes new adaptation challenges, where current deep architectures show their limits. Second, we propose an experimental study of recent shallow and deep adaptation networks, based on using Maximum Mean Discrepancy to bridge the domain gap. We study different design choices for these models by varying the network architectures and evaluate them on OFF31 and the new LandMarkDA collections. We show that shallow networks can still be competitive under an appropriate feature extraction. Finally, we also benchmark a new DA method that successfully combines the artistic image style-transfer with deep discrepancy-based networks.
{"title":"Discrepancy-Based Networks for Unsupervised Domain Adaptation: A Comparative Study","authors":"G. Csurka, Fabien Baradel, Boris Chidlovskii, S. Clinchant","doi":"10.1109/ICCVW.2017.312","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.312","url":null,"abstract":"Domain Adaptation (DA) exploits labeled data and models from similar domains in order to alleviate the annotation burden when learning a model in a new domain. Our contribution to the field is three-fold. First, we propose a new dataset, LandMarkDA, to study the adaptation between landmark place recognition models trained with different artistic image styles, such as photos, paintings and drawings. The new LandMarkDA proposes new adaptation challenges, where current deep architectures show their limits. Second, we propose an experimental study of recent shallow and deep adaptation networks, based on using Maximum Mean Discrepancy to bridge the domain gap. We study different design choices for these models by varying the network architectures and evaluate them on OFF31 and the new LandMarkDA collections. We show that shallow networks can still be competitive under an appropriate feature extraction. Finally, we also benchmark a new DA method that successfully combines the artistic image style-transfer with deep discrepancy-based networks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Selka, T. Blein, J. Burguet, E. Biot, P. Laufs, P. Andrey
Morphogenesis is a complex process that integrates several mechanisms from the molecular to the organ scales. In plants, division and growth are the two fundamental cellular mechanisms that drive morphogenesis. However, little is known about how these mechanisms are coordinated to establish functional tissue structure. A fundamental bottleneck is the current lack of techniques to systematically quantify the spatio-temporal evolution of 3D cell morphology during organ growth. Using leaf development as a relevant and challenging model to study morphogenesis, we developed a computational framework for cell analysis and quantification from 3D images and for the generation of 3D cell shape atlas. A remarkable feature of leaf morphogenesis being the formation of a laminar-like structure, we propose to automatically separate the cells corresponding to the leaf sides in the segmented leaves, by applying a clustering algorithm. The performance of the proposed pipeline was experimentally assessed on a dataset of 46 leaves in an early developmental state.
{"title":"Towards a Spatio-Temporal Atlas of 3D Cellular Parameters During Leaf Morphogenesis","authors":"F. Selka, T. Blein, J. Burguet, E. Biot, P. Laufs, P. Andrey","doi":"10.1109/ICCVW.2017.14","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.14","url":null,"abstract":"Morphogenesis is a complex process that integrates several mechanisms from the molecular to the organ scales. In plants, division and growth are the two fundamental cellular mechanisms that drive morphogenesis. However, little is known about how these mechanisms are coordinated to establish functional tissue structure. A fundamental bottleneck is the current lack of techniques to systematically quantify the spatio-temporal evolution of 3D cell morphology during organ growth. Using leaf development as a relevant and challenging model to study morphogenesis, we developed a computational framework for cell analysis and quantification from 3D images and for the generation of 3D cell shape atlas. A remarkable feature of leaf morphogenesis being the formation of a laminar-like structure, we propose to automatically separate the cells corresponding to the leaf sides in the segmented leaves, by applying a clustering algorithm. The performance of the proposed pipeline was experimentally assessed on a dataset of 46 leaves in an early developmental state.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123391354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new iterative isometric point correspondence method that relies on diffusion distance to handle challenges posed by commodity depth sensors, which usually provide incomplete and noisy surface data exhibiting holes and gaps. We formulate the correspondence problem as finding an optimal partial mapping between two given point sets, that minimizes deviation from isometry. Our algorithm starts with an initial rough correspondence between keypoints, obtained via a standard descriptor matching technique. This initial correspondence is then pruned and updated by iterating a perfect matching algorithm until convergence to find as many reliable correspondences as possible. For shapes with intrinsic symmetries such as human models, we additionally provide a symmetry aware extension to improve our formulation. The experiments show that our method provides state of the art performance over depth frames exhibiting occlusions, large deformations and topological noise.
{"title":"Reliable Isometric Point Correspondence from Depth","authors":"Emel Küpçü, Y. Yemez","doi":"10.1109/ICCVW.2017.152","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.152","url":null,"abstract":"We propose a new iterative isometric point correspondence method that relies on diffusion distance to handle challenges posed by commodity depth sensors, which usually provide incomplete and noisy surface data exhibiting holes and gaps. We formulate the correspondence problem as finding an optimal partial mapping between two given point sets, that minimizes deviation from isometry. Our algorithm starts with an initial rough correspondence between keypoints, obtained via a standard descriptor matching technique. This initial correspondence is then pruned and updated by iterating a perfect matching algorithm until convergence to find as many reliable correspondences as possible. For shapes with intrinsic symmetries such as human models, we additionally provide a symmetry aware extension to improve our formulation. The experiments show that our method provides state of the art performance over depth frames exhibiting occlusions, large deformations and topological noise.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Privacy and fairness are critical in computer vision applications, in particular when dealing with human identification. Achieving a universally secure, private, and fair systems is practically impossible as the exploitation of additional data can reveal private information in the original one. Faced with this challenge, we propose a new line of research, where the privacy is learned and used in a closed environment. The goal is to ensure that a given entity, trusted to infer certain information with our data, is blocked from inferring protected information from it. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task (face verification) is harder than the negative one (gender classification). The framework opens the door to privacy and fairness in very important closed scenarios, ranging from private data accumulation companies to law-enforcement and hospitals.
{"title":"Learning to Identify While Failing to Discriminate","authors":"Jure Sokolić, M. Rodrigues, Qiang Qiu, G. Sapiro","doi":"10.1109/ICCVW.2017.298","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.298","url":null,"abstract":"Privacy and fairness are critical in computer vision applications, in particular when dealing with human identification. Achieving a universally secure, private, and fair systems is practically impossible as the exploitation of additional data can reveal private information in the original one. Faced with this challenge, we propose a new line of research, where the privacy is learned and used in a closed environment. The goal is to ensure that a given entity, trusted to infer certain information with our data, is blocked from inferring protected information from it. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task (face verification) is harder than the negative one (gender classification). The framework opens the door to privacy and fairness in very important closed scenarios, ranging from private data accumulation companies to law-enforcement and hospitals.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125128276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig
Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.
{"title":"Siamese Networks for Chromosome Classification","authors":"Swati, Gaurav Gupta, Mohit Yadav, Monika Sharma, L. Vig","doi":"10.1109/ICCVW.2017.17","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.17","url":null,"abstract":"Karyotying is the process of pairing and ordering 23 pairs of human chromosomes from cell images on the basis of size, centromere position, and banding pattern. Karyotyping during metaphase is often used by clinical cytogeneticists to analyze human chromosomes for diagnostic purposes. It requires experience, domain expertise and considerable manual effort to efficiently perform karyotyping and diagnosis of various disorders. Therefore, automation or even partial automation is highly desirable to assist technicians and reduce the cognitive load necessary for karyotyping. With these motivations, in this paper, we attempt to develop methods for chromosome classification by borrowing the latest ideas from deep learning. More specifically, we perform straightening on chromosomes and feed them into Siamese Networks to push the embeddings of samples coming from similar labels closer. Further, we propose to perform balanced sampling from the pairwise dataset while selecting dissimilar training pairs for Siamese Networks, and an MLP based prediction on top of the embeddings obtained from the trained Siamese Networks. We perform our experiments on a real world dataset of healthy patients collected from a hospital and exhaustively compare the effect of different straightening techniques, by applying them to chromosome images prior to classification. Results demonstrate that the proposed methods speed up both training and prediction by 83 and 3 folds, respectively; while surpassing the performance of a very competitive baseline created utilizing deep convolutional neural networks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125658186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Smile detection is an interesting topic in computer vision and has received increasing attention in recent years. However, the challenge caused by age variations has not been sufficiently focused on before. In this paper, we first highlight the impact of the discrepancy between infants and adults in a quantitative way on a newly collected database. We then formulate this issue as an unsupervised domain adaptation problem and present the solution of deep transfer learning, which applies the state of the art transfer learning methods, namely Deep Adaptation Networks (DAN) and Joint Adaptation Network (JAN), to two baseline deep models, i.e. AlexNet and ResNet. Thanks to DAN and JAN, the knowledge learned by deep models from adults can be transferred to infants, where very limited labeled data are available for training. Cross-dataset experiments are conducted and the results evidently demonstrate the effectiveness of the proposed approach to smile detection across such an age gap.
{"title":"Detecting Smiles of Young Children via Deep Transfer Learning","authors":"Yu Xia, Di Huang, Yunhong Wang","doi":"10.1109/ICCVW.2017.196","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.196","url":null,"abstract":"Smile detection is an interesting topic in computer vision and has received increasing attention in recent years. However, the challenge caused by age variations has not been sufficiently focused on before. In this paper, we first highlight the impact of the discrepancy between infants and adults in a quantitative way on a newly collected database. We then formulate this issue as an unsupervised domain adaptation problem and present the solution of deep transfer learning, which applies the state of the art transfer learning methods, namely Deep Adaptation Networks (DAN) and Joint Adaptation Network (JAN), to two baseline deep models, i.e. AlexNet and ResNet. Thanks to DAN and JAN, the knowledge learned by deep models from adults can be transferred to infants, where very limited labeled data are available for training. Cross-dataset experiments are conducted and the results evidently demonstrate the effectiveness of the proposed approach to smile detection across such an age gap.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128278314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}