We propose a novel method for fusing geometric and appearance cues for road surface segmentation. Modeling colour cues using Gaussian mixtures allows the fusion to be performed optimally within a Bayesian framework, avoiding ad hoc weights. Adaptation to different scene conditions is accomplished through nearest-neighbour appearance model selection over a dictionary of mixture models learned from training data, and the thorny problem of selecting the number of components in each mixture is solved through a novel cross-validation approach. Quantitative evaluation reveals that the proposed fusion method significantly improves segmentation accuracy relative to a method that uses geometric cues alone.
{"title":"Fusing Geometry and Appearance for Road Segmentation","authors":"Gong Cheng, Yiming Qian, J. Elder","doi":"10.1109/ICCVW.2017.28","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.28","url":null,"abstract":"We propose a novel method for fusing geometric and appearance cues for road surface segmentation. Modeling colour cues using Gaussian mixtures allows the fusion to be performed optimally within a Bayesian framework, avoiding ad hoc weights. Adaptation to different scene conditions is accomplished through nearest-neighbour appearance model selection over a dictionary of mixture models learned from training data, and the thorny problem of selecting the number of components in each mixture is solved through a novel cross-validation approach. Quantitative evaluation reveals that the proposed fusion method significantly improves segmentation accuracy relative to a method that uses geometric cues alone.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Csurka, Fabien Baradel, Boris Chidlovskii, S. Clinchant
Domain Adaptation (DA) exploits labeled data and models from similar domains in order to alleviate the annotation burden when learning a model in a new domain. Our contribution to the field is three-fold. First, we propose a new dataset, LandMarkDA, to study the adaptation between landmark place recognition models trained with different artistic image styles, such as photos, paintings and drawings. The new LandMarkDA proposes new adaptation challenges, where current deep architectures show their limits. Second, we propose an experimental study of recent shallow and deep adaptation networks, based on using Maximum Mean Discrepancy to bridge the domain gap. We study different design choices for these models by varying the network architectures and evaluate them on OFF31 and the new LandMarkDA collections. We show that shallow networks can still be competitive under an appropriate feature extraction. Finally, we also benchmark a new DA method that successfully combines the artistic image style-transfer with deep discrepancy-based networks.
{"title":"Discrepancy-Based Networks for Unsupervised Domain Adaptation: A Comparative Study","authors":"G. Csurka, Fabien Baradel, Boris Chidlovskii, S. Clinchant","doi":"10.1109/ICCVW.2017.312","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.312","url":null,"abstract":"Domain Adaptation (DA) exploits labeled data and models from similar domains in order to alleviate the annotation burden when learning a model in a new domain. Our contribution to the field is three-fold. First, we propose a new dataset, LandMarkDA, to study the adaptation between landmark place recognition models trained with different artistic image styles, such as photos, paintings and drawings. The new LandMarkDA proposes new adaptation challenges, where current deep architectures show their limits. Second, we propose an experimental study of recent shallow and deep adaptation networks, based on using Maximum Mean Discrepancy to bridge the domain gap. We study different design choices for these models by varying the network architectures and evaluate them on OFF31 and the new LandMarkDA collections. We show that shallow networks can still be competitive under an appropriate feature extraction. Finally, we also benchmark a new DA method that successfully combines the artistic image style-transfer with deep discrepancy-based networks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Selka, T. Blein, J. Burguet, E. Biot, P. Laufs, P. Andrey
Morphogenesis is a complex process that integrates several mechanisms from the molecular to the organ scales. In plants, division and growth are the two fundamental cellular mechanisms that drive morphogenesis. However, little is known about how these mechanisms are coordinated to establish functional tissue structure. A fundamental bottleneck is the current lack of techniques to systematically quantify the spatio-temporal evolution of 3D cell morphology during organ growth. Using leaf development as a relevant and challenging model to study morphogenesis, we developed a computational framework for cell analysis and quantification from 3D images and for the generation of 3D cell shape atlas. A remarkable feature of leaf morphogenesis being the formation of a laminar-like structure, we propose to automatically separate the cells corresponding to the leaf sides in the segmented leaves, by applying a clustering algorithm. The performance of the proposed pipeline was experimentally assessed on a dataset of 46 leaves in an early developmental state.
{"title":"Towards a Spatio-Temporal Atlas of 3D Cellular Parameters During Leaf Morphogenesis","authors":"F. Selka, T. Blein, J. Burguet, E. Biot, P. Laufs, P. Andrey","doi":"10.1109/ICCVW.2017.14","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.14","url":null,"abstract":"Morphogenesis is a complex process that integrates several mechanisms from the molecular to the organ scales. In plants, division and growth are the two fundamental cellular mechanisms that drive morphogenesis. However, little is known about how these mechanisms are coordinated to establish functional tissue structure. A fundamental bottleneck is the current lack of techniques to systematically quantify the spatio-temporal evolution of 3D cell morphology during organ growth. Using leaf development as a relevant and challenging model to study morphogenesis, we developed a computational framework for cell analysis and quantification from 3D images and for the generation of 3D cell shape atlas. A remarkable feature of leaf morphogenesis being the formation of a laminar-like structure, we propose to automatically separate the cells corresponding to the leaf sides in the segmented leaves, by applying a clustering algorithm. The performance of the proposed pipeline was experimentally assessed on a dataset of 46 leaves in an early developmental state.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123391354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Privacy and fairness are critical in computer vision applications, in particular when dealing with human identification. Achieving a universally secure, private, and fair systems is practically impossible as the exploitation of additional data can reveal private information in the original one. Faced with this challenge, we propose a new line of research, where the privacy is learned and used in a closed environment. The goal is to ensure that a given entity, trusted to infer certain information with our data, is blocked from inferring protected information from it. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task (face verification) is harder than the negative one (gender classification). The framework opens the door to privacy and fairness in very important closed scenarios, ranging from private data accumulation companies to law-enforcement and hospitals.
{"title":"Learning to Identify While Failing to Discriminate","authors":"Jure Sokolić, M. Rodrigues, Qiang Qiu, G. Sapiro","doi":"10.1109/ICCVW.2017.298","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.298","url":null,"abstract":"Privacy and fairness are critical in computer vision applications, in particular when dealing with human identification. Achieving a universally secure, private, and fair systems is practically impossible as the exploitation of additional data can reveal private information in the original one. Faced with this challenge, we propose a new line of research, where the privacy is learned and used in a closed environment. The goal is to ensure that a given entity, trusted to infer certain information with our data, is blocked from inferring protected information from it. We design a system that learns to succeed on the positive task while simultaneously fail at the negative one, and illustrate this with challenging cases where the positive task (face verification) is harder than the negative one (gender classification). The framework opens the door to privacy and fairness in very important closed scenarios, ranging from private data accumulation companies to law-enforcement and hospitals.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125128276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new iterative isometric point correspondence method that relies on diffusion distance to handle challenges posed by commodity depth sensors, which usually provide incomplete and noisy surface data exhibiting holes and gaps. We formulate the correspondence problem as finding an optimal partial mapping between two given point sets, that minimizes deviation from isometry. Our algorithm starts with an initial rough correspondence between keypoints, obtained via a standard descriptor matching technique. This initial correspondence is then pruned and updated by iterating a perfect matching algorithm until convergence to find as many reliable correspondences as possible. For shapes with intrinsic symmetries such as human models, we additionally provide a symmetry aware extension to improve our formulation. The experiments show that our method provides state of the art performance over depth frames exhibiting occlusions, large deformations and topological noise.
{"title":"Reliable Isometric Point Correspondence from Depth","authors":"Emel Küpçü, Y. Yemez","doi":"10.1109/ICCVW.2017.152","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.152","url":null,"abstract":"We propose a new iterative isometric point correspondence method that relies on diffusion distance to handle challenges posed by commodity depth sensors, which usually provide incomplete and noisy surface data exhibiting holes and gaps. We formulate the correspondence problem as finding an optimal partial mapping between two given point sets, that minimizes deviation from isometry. Our algorithm starts with an initial rough correspondence between keypoints, obtained via a standard descriptor matching technique. This initial correspondence is then pruned and updated by iterating a perfect matching algorithm until convergence to find as many reliable correspondences as possible. For shapes with intrinsic symmetries such as human models, we additionally provide a symmetry aware extension to improve our formulation. The experiments show that our method provides state of the art performance over depth frames exhibiting occlusions, large deformations and topological noise.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distinguishing edges caused by a change in depth from other types of edges is an important problem in early vision. We investigate the performance of humans and computer vision models on this task. We use spherical imagery with ground-truth LiDAR range data to build an objective ground-truth dataset for edge classification. We compare various computational models for classifying depth from non-depth edges in small images patches and achieve the best performance (86%) with a convolutional neural network. We investigate human performance on this task in a behavioral experiment and find that human performance is lower than the CNN. Although human and CNN depth responses are correlated, observers' responses are better predicted by other observers than by the CNN. The responses of CNNs and human observers also show a slightly different pattern of correlation with low-level edge cues, which suggests that CNNs and human observers may weight these features differently for classifying edges.
{"title":"Local Depth Edge Detection in Humans and Deep Neural Networks","authors":"Krista A. Ehinger, E. Graf, W. Adams, J. Elder","doi":"10.1109/ICCVW.2017.316","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.316","url":null,"abstract":"Distinguishing edges caused by a change in depth from other types of edges is an important problem in early vision. We investigate the performance of humans and computer vision models on this task. We use spherical imagery with ground-truth LiDAR range data to build an objective ground-truth dataset for edge classification. We compare various computational models for classifying depth from non-depth edges in small images patches and achieve the best performance (86%) with a convolutional neural network. We investigate human performance on this task in a behavioral experiment and find that human performance is lower than the CNN. Although human and CNN depth responses are correlated, observers' responses are better predicted by other observers than by the CNN. The responses of CNNs and human observers also show a slightly different pattern of correlation with low-level edge cues, which suggests that CNNs and human observers may weight these features differently for classifying edges.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a method to find the location of crop plants in Unmanned Aerial Vehicle (UAV) imagery. Finding the location of plants is a crucial step to derive and track phenotypic traits for each plant. We describe some initial work in estimating field crop plant locations. We approach the problem by classifying pixels as a plant center or a non plant center. We use Multiple Instance Learning (MIL) to handle the ambiguity of plant center labeling in training data. The classification results are then post-processed to estimate the exact location of the crop plant. Experimental evaluation is conducted to evaluate the method and the result achieved an overall precision and recall of 66% and 64%, respectively.
{"title":"Locating Crop Plant Centers from UAV-Based RGB Imagery","authors":"Yuhao Chen, Javier Ribera, C. Boomsma, E. Delp","doi":"10.1109/ICCVW.2017.238","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.238","url":null,"abstract":"In this paper we propose a method to find the location of crop plants in Unmanned Aerial Vehicle (UAV) imagery. Finding the location of plants is a crucial step to derive and track phenotypic traits for each plant. We describe some initial work in estimating field crop plant locations. We approach the problem by classifying pixels as a plant center or a non plant center. We use Multiple Instance Learning (MIL) to handle the ambiguity of plant center labeling in training data. The classification results are then post-processed to estimate the exact location of the crop plant. Experimental evaluation is conducted to evaluate the method and the result achieved an overall precision and recall of 66% and 64%, respectively.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130721906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Smile detection is an interesting topic in computer vision and has received increasing attention in recent years. However, the challenge caused by age variations has not been sufficiently focused on before. In this paper, we first highlight the impact of the discrepancy between infants and adults in a quantitative way on a newly collected database. We then formulate this issue as an unsupervised domain adaptation problem and present the solution of deep transfer learning, which applies the state of the art transfer learning methods, namely Deep Adaptation Networks (DAN) and Joint Adaptation Network (JAN), to two baseline deep models, i.e. AlexNet and ResNet. Thanks to DAN and JAN, the knowledge learned by deep models from adults can be transferred to infants, where very limited labeled data are available for training. Cross-dataset experiments are conducted and the results evidently demonstrate the effectiveness of the proposed approach to smile detection across such an age gap.
{"title":"Detecting Smiles of Young Children via Deep Transfer Learning","authors":"Yu Xia, Di Huang, Yunhong Wang","doi":"10.1109/ICCVW.2017.196","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.196","url":null,"abstract":"Smile detection is an interesting topic in computer vision and has received increasing attention in recent years. However, the challenge caused by age variations has not been sufficiently focused on before. In this paper, we first highlight the impact of the discrepancy between infants and adults in a quantitative way on a newly collected database. We then formulate this issue as an unsupervised domain adaptation problem and present the solution of deep transfer learning, which applies the state of the art transfer learning methods, namely Deep Adaptation Networks (DAN) and Joint Adaptation Network (JAN), to two baseline deep models, i.e. AlexNet and ResNet. Thanks to DAN and JAN, the knowledge learned by deep models from adults can be transferred to infants, where very limited labeled data are available for training. Cross-dataset experiments are conducted and the results evidently demonstrate the effectiveness of the proposed approach to smile detection across such an age gap.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128278314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Designing autonomous vehicles suitable for urban environments remains an unresolved problem. One of the major dilemmas faced by autonomous cars is how to understand the intention of other road users and communicate with them. The existing datasets do not provide the necessary means for such higher level analysis of traffic scenes. With this in mind, we introduce a novel dataset which in addition to providing the bounding box information for pedestrian detection, also includes the behavioral and contextual annotations for the scenes. This allows combining visual and semantic information for better understanding of pedestrians' intentions in various traffic scenarios. We establish baseline approaches for analyzing the data and show that combining visual and contextual information can improve prediction of pedestrian intention at the point of crossing by at least 20%.
{"title":"Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior","authors":"Amir Rasouli, Iuliia Kotseruba, John K. Tsotsos","doi":"10.1109/ICCVW.2017.33","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.33","url":null,"abstract":"Designing autonomous vehicles suitable for urban environments remains an unresolved problem. One of the major dilemmas faced by autonomous cars is how to understand the intention of other road users and communicate with them. The existing datasets do not provide the necessary means for such higher level analysis of traffic scenes. With this in mind, we introduce a novel dataset which in addition to providing the bounding box information for pedestrian detection, also includes the behavioral and contextual annotations for the scenes. This allows combining visual and semantic information for better understanding of pedestrians' intentions in various traffic scenarios. We establish baseline approaches for analyzing the data and show that combining visual and contextual information can improve prediction of pedestrian intention at the point of crossing by at least 20%.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128787549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative adversarial networks (GANs) can be used to learn a generation function from a joint probability distribution as an input, and then visual samples with semantic properties can be generated from a marginal probability distribution. In this paper, we propose a novel algorithm named Max-Boost-GAN, which is demonstrated to boost the generative ability of GANs when the error of generation is upper bounded. Moreover, the Max-Boost-GAN can be used to learn the generation functions from two marginal probability distributions as the input, and samples of higher visual quality and variety could be generated from the joint probability distribution. Finally, novel objective functions are proposed for obtaining convergence during training the Max-Boost-GAN. Experiments on the generation of binary digits and RGB human faces show that the Max-Boost-GAN achieves boosted ability of generation as expected.
{"title":"Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks","authors":"Xinhan Di, Pengqian Yu","doi":"10.1109/ICCVW.2017.140","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.140","url":null,"abstract":"Generative adversarial networks (GANs) can be used to learn a generation function from a joint probability distribution as an input, and then visual samples with semantic properties can be generated from a marginal probability distribution. In this paper, we propose a novel algorithm named Max-Boost-GAN, which is demonstrated to boost the generative ability of GANs when the error of generation is upper bounded. Moreover, the Max-Boost-GAN can be used to learn the generation functions from two marginal probability distributions as the input, and samples of higher visual quality and variety could be generated from the joint probability distribution. Finally, novel objective functions are proposed for obtaining convergence during training the Max-Boost-GAN. Experiments on the generation of binary digits and RGB human faces show that the Max-Boost-GAN achieves boosted ability of generation as expected.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126629739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}