Pub Date : 2017-07-01Epub Date: 2017-11-09DOI: 10.1109/CVPR.2017.612
Hyunwoo J Kim, Nagesh Adluru, Heemanshu Suri, Baba C Vemuri, Sterling C Johnson, Vikas Singh
Statistical machine learning models that operate on manifold-valued data are being extensively studied in vision, motivated by applications in activity recognition, feature tracking and medical imaging. While non-parametric methods have been relatively well studied in the literature, efficient formulations for parametric models (which may offer benefits in small sample size regimes) have only emerged recently. So far, manifold-valued regression models (such as geodesic regression) are restricted to the analysis of cross-sectional data, i.e., the so-called "fixed effects" in statistics. But in most "longitudinal analysis" (e.g., when a participant provides multiple measurements, over time) the application of fixed effects models is problematic. In an effort to answer this need, this paper generalizes non-linear mixed effects model to the regime where the response variable is manifold-valued, i.e., f : Rd → ℳ. We derive the underlying model and estimation schemes and demonstrate the immediate benefits such a model can provide - both for group level and individual level analysis - on longitudinal brain imaging data. The direct consequence of our results is that longitudinal analysis of manifold-valued measurements (especially, the symmetric positive definite manifold) can be conducted in a computationally tractable manner.
{"title":"Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging.","authors":"Hyunwoo J Kim, Nagesh Adluru, Heemanshu Suri, Baba C Vemuri, Sterling C Johnson, Vikas Singh","doi":"10.1109/CVPR.2017.612","DOIUrl":"10.1109/CVPR.2017.612","url":null,"abstract":"<p><p>Statistical machine learning models that operate on manifold-valued data are being extensively studied in vision, motivated by applications in activity recognition, feature tracking and medical imaging. While non-parametric methods have been relatively well studied in the literature, efficient formulations for parametric models (which may offer benefits in small sample size regimes) have only emerged recently. So far, manifold-valued regression models (such as geodesic regression) are restricted to the analysis of cross-sectional data, i.e., the so-called \"fixed effects\" in statistics. But in most \"longitudinal analysis\" (e.g., when a participant provides multiple measurements, over time) the application of fixed effects models is problematic. In an effort to answer this need, this paper generalizes non-linear mixed effects model to the regime where the response variable is manifold-valued, i.e., f : <b>R</b><sup>d</sup> → ℳ. We derive the underlying model and estimation schemes and demonstrate the immediate benefits such a model can provide - both for group level and individual level analysis - on longitudinal brain imaging data. The direct consequence of our results is that longitudinal analysis of manifold-valued measurements (especially, the symmetric positive definite manifold) can be conducted in a computationally tractable manner.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2017 ","pages":"5777-5786"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5805155/pdf/nihms914461.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35820235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vamsi K Ithapu, Risi Kondor, Sterling C Johnson, Vikas Singh
Multiresolution analysis and matrix factorization are foundational tools in computer vision. In this work, we study the interface between these two distinct topics and obtain techniques to uncover hierarchical block structure in symmetric matrices - an important aspect in the success of many vision problems. Our new algorithm, the incremental multiresolution matrix factorization, uncovers such structure one feature at a time, and hence scales well to large matrices. We describe how this multiscale analysis goes much farther than what a direct "global" factorization of the data can identify. We evaluate the efficacy of the resulting factorizations for relative leveraging within regression tasks using medical imaging data. We also use the factorization on representations learned by popular deep networks, providing evidence of their ability to infer semantic relationships even when they are not explicitly trained to do so. We show that this algorithm can be used as an exploratory tool to improve the network architecture, and within numerous other settings in vision.
{"title":"The Incremental Multiresolution Matrix Factorization Algorithm.","authors":"Vamsi K Ithapu, Risi Kondor, Sterling C Johnson, Vikas Singh","doi":"10.1109/CVPR.2017.81","DOIUrl":"10.1109/CVPR.2017.81","url":null,"abstract":"<p><p>Multiresolution analysis and matrix factorization are foundational tools in computer vision. In this work, we study the interface between these two distinct topics and obtain techniques to uncover hierarchical block structure in symmetric matrices - an important aspect in the success of many vision problems. Our new algorithm, the incremental multiresolution matrix factorization, uncovers such structure one feature at a time, and hence scales well to large matrices. We describe how this multiscale analysis goes much farther than what a direct \"global\" factorization of the data can identify. We evaluate the efficacy of the resulting factorizations for relative leveraging within regression tasks using medical imaging data. We also use the factorization on representations learned by popular deep networks, providing evidence of their ability to infer semantic relationships even when they are not explicitly trained to do so. We show that this algorithm can be used as an exploratory tool to improve the network architecture, and within numerous other settings in vision.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2017 ","pages":"692-701"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5798492/pdf/nihms914456.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35807726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01Epub Date: 2017-11-09DOI: 10.1109/CVPR.2017.506
Zongwei Zhou, Jae Shin, Lei Zhang, Suryakanth Gurudu, Michael Gotway, Jianming Liang
Intense interest in applying convolutional neural networks (CNNs) in biomedical image analysis is wide spread, but its success is impeded by the lack of large annotated datasets in biomedical imaging. Annotating biomedical images is not only tedious and time consuming, but also demanding of costly, specialty-oriented knowledge and skills, which are not easily accessible. To dramatically reduce annotation cost, this paper presents a novel method called AIFT (active, incremental fine-tuning) to naturally integrate active learning and transfer learning into a single framework. AIFT starts directly with a pre-trained CNN to seek "worthy" samples from the unannotated for annotation, and the (fine-tuned) CNN is further fine-tuned continuously by incorporating newly annotated samples in each iteration to enhance the CNN's performance incrementally. We have evaluated our method in three different biomedical imaging applications, demonstrating that the cost of annotation can be cut by at least half. This performance is attributed to the several advantages derived from the advanced active and incremental capability of our AIFT method.
{"title":"Fine-tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally.","authors":"Zongwei Zhou, Jae Shin, Lei Zhang, Suryakanth Gurudu, Michael Gotway, Jianming Liang","doi":"10.1109/CVPR.2017.506","DOIUrl":"10.1109/CVPR.2017.506","url":null,"abstract":"<p><p>Intense interest in applying convolutional neural networks (CNNs) in biomedical image analysis is wide spread, but its success is impeded by the lack of large annotated datasets in biomedical imaging. Annotating biomedical images is not only tedious and time consuming, but also demanding of costly, specialty-oriented knowledge and skills, which are not easily accessible. To dramatically reduce annotation cost, this paper presents a novel method called AIFT (active, incremental fine-tuning) to naturally integrate active learning and transfer learning into a single framework. AIFT starts directly with a pre-trained CNN to seek \"worthy\" samples from the unannotated for annotation, and the (fine-tuned) CNN is further fine-tuned continuously by incorporating newly annotated samples in each iteration to enhance the CNN's performance incrementally. We have evaluated our method in three different biomedical imaging applications, demonstrating that the cost of annotation can be cut by at least half. This performance is attributed to the several advantages derived from the advanced active and incremental capability of our AIFT method.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2017 ","pages":"4761-4772"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191179/pdf/nihms958486.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36597835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-01Epub Date: 2016-12-12DOI: 10.1109/CVPR.2016.268
Won Hwa Kim, Hyunwoo J Kim, Nagesh Adluru, Vikas Singh
A major goal of imaging studies such as the (ongoing) Human Connectome Project (HCP) is to characterize the structural network map of the human brain and identify its associations with covariates such as genotype, risk factors, and so on that correspond to an individual. But the set of image derived measures and the set of covariates are both large, so we must first estimate a 'parsimonious' set of relations between the measurements. For instance, a Gaussian graphical model will show conditional independences between the random variables, which can then be used to setup specific downstream analyses. But most such data involve a large list of 'latent' variables that remain unobserved, yet affect the 'observed' variables sustantially. Accounting for such latent variables is not directly addressed by standard precision matrix estimation, and is tackled via highly specialized optimization methods. This paper offers a unique harmonic analysis view of this problem. By casting the estimation of the precision matrix in terms of a composition of low-frequency latent variables and high-frequency sparse terms, we show how the problem can be formulated using a new wavelet-type expansion in non-Euclidean spaces. Our formulation poses the estimation problem in the frequency space and shows how it can be solved by a simple sub-gradient scheme. We provide a set of scientific results on ~500 scans from the recently released HCP data where our algorithm recovers highly interpretable and sparse conditional dependencies between brain connectivity pathways and well-known covariates.
{"title":"Latent Variable Graphical Model Selection using Harmonic Analysis: Applications to the Human Connectome Project (HCP).","authors":"Won Hwa Kim, Hyunwoo J Kim, Nagesh Adluru, Vikas Singh","doi":"10.1109/CVPR.2016.268","DOIUrl":"10.1109/CVPR.2016.268","url":null,"abstract":"<p><p>A major goal of imaging studies such as the (ongoing) Human Connectome Project (HCP) is to characterize the structural network map of the human brain and identify its associations with covariates such as genotype, risk factors, and so on that correspond to an individual. But the set of image derived measures and the set of covariates are both large, so we must first estimate a 'parsimonious' set of relations between the measurements. For instance, a Gaussian graphical model will show conditional independences between the random variables, which can then be used to setup specific downstream analyses. But most such data involve a large list of 'latent' variables that remain unobserved, yet affect the 'observed' variables sustantially. Accounting for such latent variables is not directly addressed by standard precision matrix estimation, and is tackled via highly specialized optimization methods. This paper offers a unique harmonic analysis view of this problem. By casting the estimation of the precision matrix in terms of a composition of low-frequency latent variables and high-frequency sparse terms, we show how the problem can be formulated using a new wavelet-type expansion in non-Euclidean spaces. Our formulation poses the estimation problem in the frequency space and shows how it can be solved by a simple sub-gradient scheme. We provide a set of scientific results on ~500 scans from the recently released HCP data where our algorithm recovers highly interpretable and sparse conditional dependencies between brain connectivity pathways and well-known covariates.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2016 ","pages":"2443-2451"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5330303/pdf/nihms824434.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34778815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-01Epub Date: 2016-12-12DOI: 10.1109/CVPR.2016.34
Zizhao Zhang, Fuyong Xing, Xiaoshuang Shi, Lin Yang
Supervised contour detection methods usually require many labeled training images to obtain satisfactory performance. However, a large set of annotated data might be unavailable or extremely labor intensive. In this paper, we investigate the usage of semi-supervised learning (SSL) to obtain competitive detection accuracy with very limited training data (three labeled images). Specifically, we propose a semi-supervised structured ensemble learning approach for contour detection built on structured random forests (SRF). To allow SRF to be applicable to unlabeled data, we present an effective sparse representation approach to capture inherent structure in image patches by finding a compact and discriminative low-dimensional subspace representation in an unsupervised manner, enabling the incorporation of abundant unlabeled patches with their estimated structured labels to help SRF perform better node splitting. We re-examine the role of sparsity and propose a novel and fast sparse coding algorithm to boost the overall learning efficiency. To the best of our knowledge, this is the first attempt to apply SSL for contour detection. Extensive experiments on the BSDS500 segmentation dataset and the NYU Depth dataset demonstrate the superiority of the proposed method.
{"title":"SemiContour: A Semi-supervised Learning Approach for Contour Detection.","authors":"Zizhao Zhang, Fuyong Xing, Xiaoshuang Shi, Lin Yang","doi":"10.1109/CVPR.2016.34","DOIUrl":"10.1109/CVPR.2016.34","url":null,"abstract":"<p><p>Supervised contour detection methods usually require many labeled training images to obtain satisfactory performance. However, a large set of annotated data might be unavailable or extremely labor intensive. In this paper, we investigate the usage of semi-supervised learning (SSL) to obtain competitive detection accuracy with very limited training data (three labeled images). Specifically, we propose a semi-supervised structured ensemble learning approach for contour detection built on structured random forests (SRF). To allow SRF to be applicable to unlabeled data, we present an effective sparse representation approach to capture inherent structure in image patches by finding a compact and discriminative low-dimensional subspace representation in an unsupervised manner, enabling the incorporation of abundant unlabeled patches with their estimated structured labels to help SRF perform better node splitting. We re-examine the role of sparsity and propose a novel and fast sparse coding algorithm to boost the overall learning efficiency. To the best of our knowledge, this is the first attempt to apply SSL for contour detection. Extensive experiments on the BSDS500 segmentation dataset and the NYU Depth dataset demonstrate the superiority of the proposed method.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2016 ","pages":"251-259"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5423734/pdf/nihms-818937.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34988057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-06-01Epub Date: 2016-12-12DOI: 10.1109/CVPR.2016.546
Jie Shi, Wen Zhang, Yalin Wang
Shape space is an active research field in computer vision study. The shape distance defined in a shape space may provide a simple and refined index to represent a unique shape. Wasserstein distance defines a Riemannian metric for the Wasserstein space. It intrinsically measures the similarities between shapes and is robust to image noise. Thus it has the potential for the 3D shape indexing and classification research. While the algorithms for computing Wasserstein distance have been extensively studied, most of them only work for genus-0 surfaces. This paper proposes a novel framework to compute Wasserstein distance between general topological surfaces with hyperbolic metric. The computational algorithms are based on Ricci flow, hyperbolic harmonic map, and hyperbolic power Voronoi diagram and the method is general and robust. We apply our method to study human facial expression, longitudinal brain cortical morphometry with normal aging, and cortical shape classification in Alzheimer's disease (AD). Experimental results demonstrate that our method may be used as an effective shape index, which outperforms some other standard shape measures in our AD versus healthy control classification study.
{"title":"Shape Analysis with Hyperbolic Wasserstein Distance.","authors":"Jie Shi, Wen Zhang, Yalin Wang","doi":"10.1109/CVPR.2016.546","DOIUrl":"https://doi.org/10.1109/CVPR.2016.546","url":null,"abstract":"<p><p>Shape space is an active research field in computer vision study. The shape distance defined in a shape space may provide a simple and refined index to represent a unique shape. Wasserstein distance defines a Riemannian metric for the Wasserstein space. It intrinsically measures the similarities between shapes and is robust to image noise. Thus it has the potential for the 3D shape indexing and classification research. While the algorithms for computing Wasserstein distance have been extensively studied, most of them only work for genus-0 surfaces. This paper proposes a novel framework to compute Wasserstein distance between general topological surfaces with hyperbolic metric. The computational algorithms are based on Ricci flow, hyperbolic harmonic map, and hyperbolic power Voronoi diagram and the method is general and robust. We apply our method to study human facial expression, longitudinal brain cortical morphometry with normal aging, and cortical shape classification in Alzheimer's disease (AD). Experimental results demonstrate that our method may be used as an effective shape index, which outperforms some other standard shape measures in our AD versus healthy control classification study.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2016 ","pages":"5051-5061"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2016.546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34898674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/CVPR.2015.7298833
Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, Honggang Zhang
The face is one of the most powerful channel of nonverbal communication. The most commonly used taxonomy to describe facial behaviour is the Facial Action Coding System (FACS). FACS segments the visible effects of facial muscle activation into 30+ action units (AUs). AUs, which may occur alone and in thousands of combinations, can describe nearly all-possible facial expressions. Most existing methods for automatic AU detection treat the problem using one-vs-all classifiers and fail to exploit dependencies among AU and facial features. We introduce joint-patch and multi-label learning (JPML) to address these issues. JPML leverages group sparsity by selecting a sparse subset of facial patches while learning a multi-label classifier. In four of five comparisons on three diverse datasets, CK+, GFT, and BP4D, JPML produced the highest average F1 scores in comparison with state-of-the art.
{"title":"Joint Patch and Multi-label Learning for Facial Action Unit Detection.","authors":"Kaili Zhao, Wen-Sheng Chu, Fernando De la Torre, Jeffrey F Cohn, Honggang Zhang","doi":"10.1109/CVPR.2015.7298833","DOIUrl":"10.1109/CVPR.2015.7298833","url":null,"abstract":"<p><p>The face is one of the most powerful channel of nonverbal communication. The most commonly used taxonomy to describe facial behaviour is the Facial Action Coding System (FACS). FACS segments the visible effects of facial muscle activation into 30+ action units (AUs). AUs, which may occur alone and in thousands of combinations, can describe nearly all-possible facial expressions. Most existing methods for automatic AU detection treat the problem using one-vs-all classifiers and fail to exploit dependencies among AU and facial features. We introduce joint-patch and multi-label learning (JPML) to address these issues. JPML leverages group sparsity by selecting a sparse subset of facial patches while learning a multi-label classifier. In four of five comparisons on three diverse datasets, CK+, GFT, and BP4D, JPML produced the highest average F1 scores in comparison with state-of-the art.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2015 ","pages":"2207-2216"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2015.7298833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34542635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunwoo J Kim, Nagesh Adluru, Maxwell D Collins, Moo K Chung, Barbara B Bendlin, Sterling C Johnson, Richard J Davidson, Vikas Singh
Linear regression is a parametric model which is ubiquitous in scientific analysis. The classical setup where the observations and responses, i.e., (xi , yi ) pairs, are Euclidean is well studied. The setting where yi is manifold valued is a topic of much interest, motivated by applications in shape analysis, topic modeling, and medical imaging. Recent work gives strategies for max-margin classifiers, principal components analysis, and dictionary learning on certain types of manifolds. For parametric regression specifically, results within the last year provide mechanisms to regress one real-valued parameter, xi ∈ R, against a manifold-valued variable, yi ∈ . We seek to substantially extend the operating range of such methods by deriving schemes for multivariate multiple linear regression -a manifold-valued dependent variable against multiple independent variables, i.e., f : Rn → . Our variational algorithm efficiently solves for multiple geodesic bases on the manifold concurrently via gradient updates. This allows us to answer questions such as: what is the relationship of the measurement at voxel y to disease when conditioned on age and gender. We show applications to statistical analysis of diffusion weighted images, which give rise to regression tasks on the manifold GL(n)/O(n) for diffusion tensor images (DTI) and the Hilbert unit sphere for orientation distribution functions (ODF) from high angular resolution acquisition. The companion open-source code is available on nitrc.org/projects/riem_mglm.
{"title":"Multivariate General Linear Models (MGLM) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images.","authors":"Hyunwoo J Kim, Nagesh Adluru, Maxwell D Collins, Moo K Chung, Barbara B Bendlin, Sterling C Johnson, Richard J Davidson, Vikas Singh","doi":"10.1109/CVPR.2014.352","DOIUrl":"https://doi.org/10.1109/CVPR.2014.352","url":null,"abstract":"<p><p>Linear regression is a parametric model which is ubiquitous in scientific analysis. The classical setup where the observations and responses, i.e., (<i>x<sub>i</sub></i> , <i>y<sub>i</sub></i> ) pairs, are Euclidean is well studied. The setting where <i>y<sub>i</sub></i> is manifold valued is a topic of much interest, motivated by applications in shape analysis, topic modeling, and medical imaging. Recent work gives strategies for max-margin classifiers, principal components analysis, and dictionary learning on certain types of manifolds. For parametric regression specifically, results within the last year provide mechanisms to regress one real-valued parameter, <i>x<sub>i</sub></i> ∈ <b>R</b>, against a manifold-valued variable, <i>y<sub>i</sub></i> ∈ . We seek to substantially extend the operating range of such methods by deriving schemes for multivariate multiple linear regression -a manifold-valued dependent variable against multiple independent variables, i.e., <i>f</i> : <b>R</b><i><sup>n</sup></i> → . Our variational algorithm efficiently solves for multiple geodesic bases on the manifold concurrently via gradient updates. This allows us to answer questions such as: what is the relationship of the measurement at voxel <i>y</i> to disease when conditioned on age and gender. We show applications to statistical analysis of diffusion weighted images, which give rise to regression tasks on the manifold <i>GL</i>(<i>n</i>)/<i>O</i>(<i>n</i>) for diffusion tensor images (DTI) and the Hilbert unit sphere for orientation distribution functions (ODF) from high angular resolution acquisition. The companion open-source code is available on nitrc.org/projects/riem_mglm.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2014 ","pages":"2705-2712"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2014.352","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32967355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We proposed a deformable patches based method for single image super-resolution. By the concept of deformation, a patch is not regarded as a fixed vector but a flexible deformation flow. Via deformable patches, the dictionary can cover more patterns that do not appear, thus becoming more expressive. We present the energy function with slow, smooth and flexible prior for deformation model. During example-based super-resolution, we develop the deformation similarity based on the minimized energy function for basic patch matching. For robustness, we utilize multiple deformed patches combination for the final reconstruction. Experiments evaluate the deformation effectiveness and super-resolution performance, showing that the deformable patches help improve the representation accuracy and perform better than the state-of-art methods.
{"title":"Single Image Super-resolution using Deformable Patches.","authors":"Yu Zhu, Yanning Zhang, Alan L Yuille","doi":"10.1109/CVPR.2014.373","DOIUrl":"10.1109/CVPR.2014.373","url":null,"abstract":"<p><p>We proposed a deformable patches based method for single image super-resolution. By the concept of deformation, a patch is not regarded as a fixed vector but a flexible deformation flow. Via deformable patches, the dictionary can cover more patterns that do not appear, thus becoming more expressive. We present the energy function with slow, smooth and flexible prior for deformation model. During example-based super-resolution, we develop the deformation similarity based on the minimized energy function for basic patch matching. For robustness, we utilize multiple deformed patches combination for the final reconstruction. Experiments evaluate the deformation effectiveness and super-resolution performance, showing that the deformable patches help improve the representation accuracy and perform better than the state-of-art methods.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2014 ","pages":"2917-2924"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4249591/pdf/nihms-612394.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32879364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Papandreou, Liang-Chieh Chen, Alan L Yuille
The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient of the proposed model is a compact dictionary of mini-epitomes, learned in an unsupervised fashion on a large collection of images. The use of epitomes allows us to explicitly account for photometric and position variability in image appearance. We show that this flexibility considerably increases the capacity of the dictionary to accurately approximate the appearance of image patches and support recognition tasks. For image classification, we develop histogram-based image encoding methods tailored to the epitomic representation, as well as an "epitomic footprint" encoding which is easy to visualize and highlights the generative nature of our model. We discuss in detail computational aspects and develop efficient algorithms to make the model scalable to large tasks. The proposed techniques are evaluated with experiments on the challenging PASCAL VOC 2007 image classification benchmark.
{"title":"Modeling Image Patches with a Generic Dictionary of Mini-Epitomes.","authors":"George Papandreou, Liang-Chieh Chen, Alan L Yuille","doi":"10.1109/CVPR.2014.264","DOIUrl":"https://doi.org/10.1109/CVPR.2014.264","url":null,"abstract":"<p><p>The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient of the proposed model is a compact dictionary of mini-epitomes, learned in an unsupervised fashion on a large collection of images. The use of epitomes allows us to explicitly account for photometric and position variability in image appearance. We show that this flexibility considerably increases the capacity of the dictionary to accurately approximate the appearance of image patches and support recognition tasks. For image classification, we develop histogram-based image encoding methods tailored to the epitomic representation, as well as an \"epitomic footprint\" encoding which is easy to visualize and highlights the generative nature of our model. We discuss in detail computational aspects and develop efficient algorithms to make the model scalable to large tasks. The proposed techniques are evaluated with experiments on the challenging PASCAL VOC 2007 image classification benchmark.</p>","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2014 ","pages":"2059-2066"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2014.264","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33964769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}