Multiple-instance learning (MIL) is a unique learning problem in which training data labels are available only for collections of objects (called bags) instead of individual objects (called instances). A plethora of approaches have been developed to solve this problem in the past years. Popular methods include the diverse density, MILIS and DD-SVM. While having been widely used, these methods, particularly those in computer vision have attempted fairly sophisticated solutions to solve certain unique and particular configurations of the MIL space. In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. We show that these methods work as well, if not better than most recently published methods on benchmark datasets. We compare and contrast our analysis with the well-established diverse-density approach and its variants in recent literature, using benchmark datasets including the Musk, Andrews' and Corel datasets, along with a diabetic retinopathy pathology diagnosis dataset. Experimental results demonstrate that, while enjoying an intuitive interpretation and supporting fast learning, these method have the potential of delivering improved performance even for complex data arising from real-world applications.
{"title":"Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning","authors":"Ragav Venkatesan, P. S. Chandakkar, Baoxin Li","doi":"10.1109/ICCV.2015.299","DOIUrl":"https://doi.org/10.1109/ICCV.2015.299","url":null,"abstract":"Multiple-instance learning (MIL) is a unique learning problem in which training data labels are available only for collections of objects (called bags) instead of individual objects (called instances). A plethora of approaches have been developed to solve this problem in the past years. Popular methods include the diverse density, MILIS and DD-SVM. While having been widely used, these methods, particularly those in computer vision have attempted fairly sophisticated solutions to solve certain unique and particular configurations of the MIL space. In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. We show that these methods work as well, if not better than most recently published methods on benchmark datasets. We compare and contrast our analysis with the well-established diverse-density approach and its variants in recent literature, using benchmark datasets including the Musk, Andrews' and Corel datasets, along with a diabetic retinopathy pathology diagnosis dataset. Experimental results demonstrate that, while enjoying an intuitive interpretation and supporting fast learning, these method have the potential of delivering improved performance even for complex data arising from real-world applications.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"8 1","pages":"2605-2613"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82553768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helge Rhodin, Nadia Robertini, Christian Richardt, H. Seidel, C. Theobalt
Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.
{"title":"A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation","authors":"Helge Rhodin, Nadia Robertini, Christian Richardt, H. Seidel, C. Theobalt","doi":"10.1109/ICCV.2015.94","DOIUrl":"https://doi.org/10.1109/ICCV.2015.94","url":null,"abstract":"Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"60 1","pages":"765-773"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76247922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since its introduction as a powerful graph-based method for image segmentation, the Normalized Cuts (NCuts) algorithm has been generalized to incorporate expert knowledge about how certain pixels or regions should be grouped, or how the resulting segmentation should be biased to be correlated with priors. Previous approaches incorporate hard must-link constraints on how certain pixels should be grouped as well as hard cannot-link constraints on how other pixels should be separated into different groups. In this paper, we reformulate NCuts to allow both sets of constraints to be handled in a soft manner, enabling the user to tune the degree to which the constraints are satisfied. An approximate spectral solution to the reformulated problem exists without requiring explicit construction of a large, dense matrix, hence, computation time is comparable to that of unconstrained NCuts. Using synthetic data and real imagery, we show that soft handling of constraints yields better results than unconstrained NCuts and enables more robust clustering and segmentation than is possible when the constraints are strictly enforced.
{"title":"Semi-Supervised Normalized Cuts for Image Segmentation","authors":"Selene E. Chew, N. Cahill","doi":"10.1109/ICCV.2015.200","DOIUrl":"https://doi.org/10.1109/ICCV.2015.200","url":null,"abstract":"Since its introduction as a powerful graph-based method for image segmentation, the Normalized Cuts (NCuts) algorithm has been generalized to incorporate expert knowledge about how certain pixels or regions should be grouped, or how the resulting segmentation should be biased to be correlated with priors. Previous approaches incorporate hard must-link constraints on how certain pixels should be grouped as well as hard cannot-link constraints on how other pixels should be separated into different groups. In this paper, we reformulate NCuts to allow both sets of constraints to be handled in a soft manner, enabling the user to tune the degree to which the constraints are satisfied. An approximate spectral solution to the reformulated problem exists without requiring explicit construction of a large, dense matrix, hence, computation time is comparable to that of unconstrained NCuts. Using synthetic data and real imagery, we show that soft handling of constraints yields better results than unconstrained NCuts and enables more robust clustering and segmentation than is possible when the constraints are strictly enforced.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"194 1","pages":"1716-1723"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86828157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debadeepta Dey, V. Ramakrishna, M. Hebert, J. Bagnell
We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. Current state-of-the-art approaches learn a single model and modify inference procedures to produce a small number of diverse predictions. We take the alternate route of modifying the learning procedure to directly optimize for good, high recall sequences of structured-output predictors. Our approach introduces no new parameters, naturally learns diverse predictions and is not tied to any specific structured learning or inference procedure. We leverage recent advances in the contextual submodular maximization literature to learn a sequence of predictors and empirically demonstrate the simplicity and performance of our approach on multiple challenging vision tasks including achieving state-of-the-art results on multiple predictions for monocular pose-estimation and image foreground/background segmentation.
{"title":"Predicting Multiple Structured Visual Interpretations","authors":"Debadeepta Dey, V. Ramakrishna, M. Hebert, J. Bagnell","doi":"10.1109/ICCV.2015.337","DOIUrl":"https://doi.org/10.1109/ICCV.2015.337","url":null,"abstract":"We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. Current state-of-the-art approaches learn a single model and modify inference procedures to produce a small number of diverse predictions. We take the alternate route of modifying the learning procedure to directly optimize for good, high recall sequences of structured-output predictors. Our approach introduces no new parameters, naturally learns diverse predictions and is not tied to any specific structured learning or inference procedure. We leverage recent advances in the contextual submodular maximization literature to learn a sequence of predictors and empirically demonstrate the simplicity and performance of our approach on multiple challenging vision tasks including achieving state-of-the-art results on multiple predictions for monocular pose-estimation and image foreground/background segmentation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"2947-2955"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87995521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work we propose a novel approach to the problem of multi-view stereo reconstruction. Building upon the previously proposed PatchMatch stereo and PM-Huber algorithm we introduce an extension to the multi-view scenario that employs an iterative refinement scheme. Our proposed approach uses an extended and robustified volumetric truncated signed distance function representation, which is advantageous for the fusion of refined depth maps and also for raycasting the current reconstruction estimation together with estimated depth normals into arbitrary camera views. We formulate the combined multi-view stereo reconstruction and refinement as a variational optimization problem. The newly introduced plane based smoothing term in the energy formulation is guided by the current reconstruction confidence and the image contents. Further we propose an extension of the PatchMatch scheme with an additional KLT step to avoid unnecessary sampling iterations. Improper camera poses are corrected by a direct image aligment step that performs robust outlier compensation by means of a recently proposed kernel lifting framework. To speed up the optimization of the variational formulation an adapted scheme is used for faster convergence.
{"title":"Variational PatchMatch MultiView Reconstruction and Refinement","authors":"Philipp Heise, B. Jensen, S. Klose, Alois Knoll","doi":"10.1109/ICCV.2015.107","DOIUrl":"https://doi.org/10.1109/ICCV.2015.107","url":null,"abstract":"In this work we propose a novel approach to the problem of multi-view stereo reconstruction. Building upon the previously proposed PatchMatch stereo and PM-Huber algorithm we introduce an extension to the multi-view scenario that employs an iterative refinement scheme. Our proposed approach uses an extended and robustified volumetric truncated signed distance function representation, which is advantageous for the fusion of refined depth maps and also for raycasting the current reconstruction estimation together with estimated depth normals into arbitrary camera views. We formulate the combined multi-view stereo reconstruction and refinement as a variational optimization problem. The newly introduced plane based smoothing term in the energy formulation is guided by the current reconstruction confidence and the image contents. Further we propose an extension of the PatchMatch scheme with an additional KLT step to avoid unnecessary sampling iterations. Improper camera poses are corrected by a direct image aligment step that performs robust outlier compensation by means of a recently proposed kernel lifting framework. To speed up the optimization of the variational formulation an adapted scheme is used for faster convergence.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"32 1","pages":"882-890"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88869164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ferns ensembles offer an accurate and efficient multiclass non-linear classification, commonly at the expense of consuming a large amount of memory. We introduce a two-fold contribution that produces large reductions in their memory consumption. First, an efficient L0 regularised cost optimisation finds a sparse representation of the posterior probabilities in the ensemble by discarding elements with zero contribution to valid responses in the training samples. As a by-product this can produce a prediction accuracy gain that, if required, can be traded for further reductions in memory size and prediction time. Secondly, posterior probabilities are quantised and stored in a memory-friendly sparse data structure. We reported a minimum of 75% memory reduction for different types of classification problems using generative and discriminative ferns ensembles, without increasing prediction time or classification error. For image patch recognition our proposal produced a 90% memory reduction, and improved in several percentage points the prediction accuracy.
{"title":"Improving Ferns Ensembles by Sparsifying and Quantising Posterior Probabilities","authors":"Antonio L. Rodríguez, V. Sequeira","doi":"10.1109/ICCV.2015.467","DOIUrl":"https://doi.org/10.1109/ICCV.2015.467","url":null,"abstract":"Ferns ensembles offer an accurate and efficient multiclass non-linear classification, commonly at the expense of consuming a large amount of memory. We introduce a two-fold contribution that produces large reductions in their memory consumption. First, an efficient L0 regularised cost optimisation finds a sparse representation of the posterior probabilities in the ensemble by discarding elements with zero contribution to valid responses in the training samples. As a by-product this can produce a prediction accuracy gain that, if required, can be traded for further reductions in memory size and prediction time. Secondly, posterior probabilities are quantised and stored in a memory-friendly sparse data structure. We reported a minimum of 75% memory reduction for different types of classification problems using generative and discriminative ferns ensembles, without increasing prediction time or classification error. For image patch recognition our proposal produced a 90% memory reduction, and improved in several percentage points the prediction accuracy.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"188 1","pages":"4103-4111"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83053129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junchi Yan, Hongteng Xu, H. Zha, Xiaokang Yang, Huanxi Liu, Stephen M. Chu
Graph matching has a wide spectrum of real-world applications and in general is known NP-hard. In many vision tasks, one realistic problem arises for finding the global node mappings across a batch of corrupted weighted graphs. This paper is an attempt to connect graph matching, especially multi-graph matching to the matrix decomposition model and its relevant on-the-shelf convex optimization algorithms. Our method aims to extract the common inliers and their synchronized permutations from disordered weighted graphs in the presence of deformation and outliers. Under the proposed framework, several variants can be derived in the hope of accommodating to other types of noises. Experimental results on both synthetic data and real images empirically show that the proposed paradigm exhibits several interesting behaviors and in many cases performs competitively with the state-of-the-arts.
{"title":"A Matrix Decomposition Perspective to Multiple Graph Matching","authors":"Junchi Yan, Hongteng Xu, H. Zha, Xiaokang Yang, Huanxi Liu, Stephen M. Chu","doi":"10.1109/ICCV.2015.31","DOIUrl":"https://doi.org/10.1109/ICCV.2015.31","url":null,"abstract":"Graph matching has a wide spectrum of real-world applications and in general is known NP-hard. In many vision tasks, one realistic problem arises for finding the global node mappings across a batch of corrupted weighted graphs. This paper is an attempt to connect graph matching, especially multi-graph matching to the matrix decomposition model and its relevant on-the-shelf convex optimization algorithms. Our method aims to extract the common inliers and their synchronized permutations from disordered weighted graphs in the presence of deformation and outliers. Under the proposed framework, several variants can be derived in the hope of accommodating to other types of noises. Experimental results on both synthetic data and real images empirically show that the proposed paradigm exhibits several interesting behaviors and in many cases performs competitively with the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"115 1","pages":"199-207"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79606542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kontschieder, M. Fiterau, A. Criminisi, S. R. Bulò
We present Deep Neural Decision Forests - a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
{"title":"Deep Neural Decision Forests","authors":"P. Kontschieder, M. Fiterau, A. Criminisi, S. R. Bulò","doi":"10.1109/ICCV.2015.172","DOIUrl":"https://doi.org/10.1109/ICCV.2015.172","url":null,"abstract":"We present Deep Neural Decision Forests - a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"1467-1475"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79633308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunwoo J. Kim, N. Adluru, Monami Banerjee, B. Vemuri, Vikas Singh
Probability density functions (PDFs) are fundamental "objects" in mathematics with numerous applications in computer vision, machine learning and medical imaging. The feasibility of basic operations such as computing the distance between two PDFs and estimating a mean of a set of PDFs is a direct function of the representation we choose to work with. In this paper, we study the Gaussian mixture model (GMM) representation of the PDFs motivated by its numerous attractive features. (1) GMMs are arguably more interpretable than, say, square root parameterizations (2) the model complexity can be explicitly controlled by the number of components and (3) they are already widely used in many applications. The main contributions of this paper are numerical algorithms to enable basic operations on such objects that strictly respect their underlying geometry. For instance, when operating with a set of k component GMMs, a first order expectation is that the result of simple operations like interpolation and averaging should provide an object that is also a k component GMM. The literature provides very little guidance on enforcing such requirements systematically. It turns out that these tasks are important internal modules for analysis and processing of a field of ensemble average propagators (EAPs), common in diffusion weighted magnetic resonance imaging. We provide proof of principle experiments showing how the proposed algorithms for interpolation can facilitate statistical analysis of such data, essential to many neuroimaging studies. Separately, we also derive interesting connections of our algorithm with functional spaces of Gaussians, that may be of independent interest.
{"title":"Interpolation on the Manifold of K Component GMMs","authors":"Hyunwoo J. Kim, N. Adluru, Monami Banerjee, B. Vemuri, Vikas Singh","doi":"10.1109/ICCV.2015.330","DOIUrl":"https://doi.org/10.1109/ICCV.2015.330","url":null,"abstract":"Probability density functions (PDFs) are fundamental \"objects\" in mathematics with numerous applications in computer vision, machine learning and medical imaging. The feasibility of basic operations such as computing the distance between two PDFs and estimating a mean of a set of PDFs is a direct function of the representation we choose to work with. In this paper, we study the Gaussian mixture model (GMM) representation of the PDFs motivated by its numerous attractive features. (1) GMMs are arguably more interpretable than, say, square root parameterizations (2) the model complexity can be explicitly controlled by the number of components and (3) they are already widely used in many applications. The main contributions of this paper are numerical algorithms to enable basic operations on such objects that strictly respect their underlying geometry. For instance, when operating with a set of k component GMMs, a first order expectation is that the result of simple operations like interpolation and averaging should provide an object that is also a k component GMM. The literature provides very little guidance on enforcing such requirements systematically. It turns out that these tasks are important internal modules for analysis and processing of a field of ensemble average propagators (EAPs), common in diffusion weighted magnetic resonance imaging. We provide proof of principle experiments showing how the proposed algorithms for interpolation can facilitate statistical analysis of such data, essential to many neuroimaging studies. Separately, we also derive interesting connections of our algorithm with functional spaces of Gaussians, that may be of independent interest.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"222 1","pages":"2884-2892"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83480056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. Our starting point is the fact that some expressions, such as smiles, are very easily elicited, annotated, and automatically detected, while others are much harder to elicit and to annotate. We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU. In order to design such a model, we propose a novel Multi-Task Learning and the associated Transfer Learning framework, in which we consider both relations across subjects and AUs. That is to say, we consider a tensor structure among the tasks. Our approach hinges on learning the latent relations among tasks using one single reference AU, and then transferring these latent relations to other AUs. We show that we are able to effectively make use of the annotated data for AU12 when learning other person-specific AU models, even in the absence of data for the target task. Finally, we show the excellent performance of our method when small amounts of annotated data for the target tasks are made available.
{"title":"Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection","authors":"Timur R. Almaev, Brais Martínez, M. Valstar","doi":"10.1109/ICCV.2015.430","DOIUrl":"https://doi.org/10.1109/ICCV.2015.430","url":null,"abstract":"In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. Our starting point is the fact that some expressions, such as smiles, are very easily elicited, annotated, and automatically detected, while others are much harder to elicit and to annotate. We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU. In order to design such a model, we propose a novel Multi-Task Learning and the associated Transfer Learning framework, in which we consider both relations across subjects and AUs. That is to say, we consider a tensor structure among the tasks. Our approach hinges on learning the latent relations among tasks using one single reference AU, and then transferring these latent relations to other AUs. We show that we are able to effectively make use of the annotated data for AU12 when learning other person-specific AU models, even in the absence of data for the target task. Finally, we show the excellent performance of our method when small amounts of annotated data for the target tasks are made available.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"19 2 1","pages":"3774-3782"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83553064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}