Chia-Yin Tsai, Kiriakos N. Kutulakos, S. Narasimhan, Aswin C. Sankaranarayanan
Non-line-of-sight (NLOS) imaging utilizes the full 5D light transient measurements to reconstruct scenes beyond the cameras field of view. Mathematically, this requires solving an elliptical tomography problem that unmixes the shape and albedo from spatially-multiplexed measurements of the NLOS scene. In this paper, we propose a new approach for NLOS imaging by studying the properties of first-returning photons from three-bounce light paths. We show that the times of flight of first-returning photons are dependent only on the geometry of the NLOS scene and each observation is almost always generated from a single NLOS scene point. Exploiting these properties, we derive a space carving algorithm for NLOS scenes. In addition, by assuming local planarity, we derive an algorithm to localize NLOS scene points in 3D and estimate their surface normals. Our methods do not require either the full transient measurements or solving the hard elliptical tomography problem. We demonstrate the effectiveness of our methods through simulations as well as real data captured from a SPAD sensor.
{"title":"The Geometry of First-Returning Photons for Non-Line-of-Sight Imaging","authors":"Chia-Yin Tsai, Kiriakos N. Kutulakos, S. Narasimhan, Aswin C. Sankaranarayanan","doi":"10.1109/CVPR.2017.251","DOIUrl":"https://doi.org/10.1109/CVPR.2017.251","url":null,"abstract":"Non-line-of-sight (NLOS) imaging utilizes the full 5D light transient measurements to reconstruct scenes beyond the cameras field of view. Mathematically, this requires solving an elliptical tomography problem that unmixes the shape and albedo from spatially-multiplexed measurements of the NLOS scene. In this paper, we propose a new approach for NLOS imaging by studying the properties of first-returning photons from three-bounce light paths. We show that the times of flight of first-returning photons are dependent only on the geometry of the NLOS scene and each observation is almost always generated from a single NLOS scene point. Exploiting these properties, we derive a space carving algorithm for NLOS scenes. In addition, by assuming local planarity, we derive an algorithm to localize NLOS scene points in 3D and estimate their surface normals. Our methods do not require either the full transient measurements or solving the hard elliptical tomography problem. We demonstrate the effectiveness of our methods through simulations as well as real data captured from a SPAD sensor.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"74 1","pages":"2336-2344"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88772605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shay Deutsch, Soheil Kolouri, Kyungnam Kim, Y. Owechko, Stefano Soatto
We address zero-shot learning using a new manifold alignment framework based on a localized multi-scale transform on graphs. Our inference approach includes a smoothness criterion for a function mapping nodes on a graph (visual representation) onto a linear space (semantic representation), which we optimize using multi-scale graph wavelets. The robustness of the ensuing scheme allows us to operate with automatically generated semantic annotations, resulting in an algorithm that is entirely free of manual supervision, and yet improves the state-of-the-art as measured on benchmark datasets.
{"title":"Zero Shot Learning via Multi-scale Manifold Regularization","authors":"Shay Deutsch, Soheil Kolouri, Kyungnam Kim, Y. Owechko, Stefano Soatto","doi":"10.1109/CVPR.2017.562","DOIUrl":"https://doi.org/10.1109/CVPR.2017.562","url":null,"abstract":"We address zero-shot learning using a new manifold alignment framework based on a localized multi-scale transform on graphs. Our inference approach includes a smoothness criterion for a function mapping nodes on a graph (visual representation) onto a linear space (semantic representation), which we optimize using multi-scale graph wavelets. The robustness of the ensuing scheme allows us to operate with automatically generated semantic annotations, resulting in an algorithm that is entirely free of manual supervision, and yet improves the state-of-the-art as measured on benchmark datasets.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"30 1","pages":"5292-5299"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78799092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.
{"title":"Discretely Coding Semantic Rank Orders for Supervised Image Hashing","authors":"Li Liu, Ling Shao, Fumin Shen, Mengyang Yu","doi":"10.1109/CVPR.2017.546","DOIUrl":"https://doi.org/10.1109/CVPR.2017.546","url":null,"abstract":"Learning to hash has been recognized to accomplish highly efficient storage and retrieval for large-scale visual data. Particularly, ranking-based hashing techniques have recently attracted broad research attention because ranking accuracy among the retrieved data is well explored and their objective is more applicable to realistic search tasks. However, directly optimizing discrete hash codes without continuous-relaxations on a nonlinear ranking objective is infeasible by either traditional optimization methods or even recent discrete hashing algorithms. To address this challenging issue, in this paper, we introduce a novel supervised hashing method, dubbed Discrete Semantic Ranking Hashing (DSeRH), which aims to directly embed semantic rank orders into binary codes. In DSeRH, a generalized Adaptive Discrete Minimization (ADM) approach is proposed to discretely optimize binary codes with the quadratic nonlinear ranking objective in an iterative manner and is guaranteed to converge quickly. Additionally, instead of using 0/1 independent labels to form rank orders as in previous works, we generate the listwise rank orders from the high-level semantic word embeddings which can quantitatively capture the intrinsic correlation between different categories. We evaluate our DSeRH, coupled with both linear and deep convolutional neural network (CNN) hash functions, on three image datasets, i.e., CIFAR-10, SUN397 and ImageNet100, and the results manifest that DSeRH can outperform the state-of-the-art ranking-based hashing methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"44 25 1","pages":"5140-5149"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72639515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective. Conventionally, a number of full-reference image quality assessment (FR-IQA) methods adopted various computational models of the human visual system (HVS) from psychological vision science research. In this paper, we propose a novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases. Different from previous studies, our model seeks the optimal visual weight based on understanding of database information itself without any prior knowledge of the HVS. Through the experiments, we show that the predicted visual sensitivity maps agree with the human subjective opinions. In addition, DeepQA achieves the state-of-the-art prediction accuracy among FR-IQA models.
{"title":"Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework","authors":"Jongyoo Kim, Sanghoon Lee","doi":"10.1109/CVPR.2017.213","DOIUrl":"https://doi.org/10.1109/CVPR.2017.213","url":null,"abstract":"Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective. Conventionally, a number of full-reference image quality assessment (FR-IQA) methods adopted various computational models of the human visual system (HVS) from psychological vision science research. In this paper, we propose a novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases. Different from previous studies, our model seeks the optimal visual weight based on understanding of database information itself without any prior knowledge of the HVS. Through the experiments, we show that the predicted visual sensitivity maps agree with the human subjective opinions. In addition, DeepQA achieves the state-of-the-art prediction accuracy among FR-IQA models.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"32 1","pages":"1969-1977"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81250728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spectral Clustering is one of pioneered clustering methods in machine learning and pattern recognition field. It relies on the spectral decomposition criterion to learn a low-dimensonal embedding of data for a basic clustering algorithm such as the k-means. The recent sparse Spectral clustering (SSC) introduces the sparsity for the similarity in low-dimensional space by enforcing a sparsity-induced penalty, resulting a non-convex optimization, and the solution is calculated through a relaxed convex problem via the standard ADMM (Alternative Direction Method of Multipliers), rather than inferring latent representation from eigen-structure. This paper provides a direct solution as solving a new Grassmann optimization problem. By this way calculating latent embedding becomes part of optmization on manifolds and the recently developed manifold optimization methods can be applied. It turns out the learned new features are not only very informative for clustering, but also more intuitive and effective in visualization after dimensionality reduction. We conduct empirical studies on simulated datasets and several real-world benchmark datasets to validate the proposed methods. Experimental results exhibit the effectiveness of this new manifold-based clustering and dimensionality reduction method.
{"title":"Grassmannian Manifold Optimization Assisted Sparse Spectral Clustering","authors":"Qiong Wang, Junbin Gao, Hong Li","doi":"10.1109/CVPR.2017.335","DOIUrl":"https://doi.org/10.1109/CVPR.2017.335","url":null,"abstract":"Spectral Clustering is one of pioneered clustering methods in machine learning and pattern recognition field. It relies on the spectral decomposition criterion to learn a low-dimensonal embedding of data for a basic clustering algorithm such as the k-means. The recent sparse Spectral clustering (SSC) introduces the sparsity for the similarity in low-dimensional space by enforcing a sparsity-induced penalty, resulting a non-convex optimization, and the solution is calculated through a relaxed convex problem via the standard ADMM (Alternative Direction Method of Multipliers), rather than inferring latent representation from eigen-structure. This paper provides a direct solution as solving a new Grassmann optimization problem. By this way calculating latent embedding becomes part of optmization on manifolds and the recently developed manifold optimization methods can be applied. It turns out the learned new features are not only very informative for clustering, but also more intuitive and effective in visualization after dimensionality reduction. We conduct empirical studies on simulated datasets and several real-world benchmark datasets to validate the proposed methods. Experimental results exhibit the effectiveness of this new manifold-based clustering and dimensionality reduction method.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"2 1","pages":"3145-3153"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79922376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN. As the DNN performs a mapping from the input space to the output space through many layers sequentially, we define the distilled knowledge to be transferred in terms of flow between layers, which is calculated by computing the inner product between features from two layers. When we compare the student DNN and the original network with the same size as the student DNN but trained without a teacher network, the proposed method of transferring the distilled knowledge as the flow between two layers exhibits three important phenomena: (1) the student DNN that learns the distilled knowledge is optimized much faster than the original model, (2) the student DNN outperforms the original DNN, and (3) the student DNN can learn the distilled knowledge from a teacher DNN that is trained at a different task, and the student DNN outperforms the original DNN that is trained from scratch.
{"title":"A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning","authors":"Junho Yim, Donggyu Joo, Ji-Hoon Bae, Junmo Kim","doi":"10.1109/CVPR.2017.754","DOIUrl":"https://doi.org/10.1109/CVPR.2017.754","url":null,"abstract":"We introduce a novel technique for knowledge transfer, where knowledge from a pretrained deep neural network (DNN) is distilled and transferred to another DNN. As the DNN performs a mapping from the input space to the output space through many layers sequentially, we define the distilled knowledge to be transferred in terms of flow between layers, which is calculated by computing the inner product between features from two layers. When we compare the student DNN and the original network with the same size as the student DNN but trained without a teacher network, the proposed method of transferring the distilled knowledge as the flow between two layers exhibits three important phenomena: (1) the student DNN that learns the distilled knowledge is optimized much faster than the original model, (2) the student DNN outperforms the original DNN, and (3) the student DNN can learn the distilled knowledge from a teacher DNN that is trained at a different task, and the student DNN outperforms the original DNN that is trained from scratch.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"40 1","pages":"7130-7138"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91203950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs). Image based features extracted by 2D CNNs are not directly suitable for 3D hand pose estimation due to the lack of 3D spatial information. Our proposed 3D CNN taking a 3D volumetric representation of the hand depth image as input can capture the 3D spatial structure of the input and accurately regress full 3D hand pose in a single pass. In order to make the 3D CNN robust to variations in hand sizes and global orientations, we perform 3D data augmentation on the training data. Experiments show that our proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as our implementation runs at over 215 fps on a standard computer with a single GPU.
{"title":"3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images","authors":"Liuhao Ge, Hui Liang, Junsong Yuan, D. Thalmann","doi":"10.1109/CVPR.2017.602","DOIUrl":"https://doi.org/10.1109/CVPR.2017.602","url":null,"abstract":"We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs). Image based features extracted by 2D CNNs are not directly suitable for 3D hand pose estimation due to the lack of 3D spatial information. Our proposed 3D CNN taking a 3D volumetric representation of the hand depth image as input can capture the 3D spatial structure of the input and accurately regress full 3D hand pose in a single pass. In order to make the 3D CNN robust to variations in hand sizes and global orientations, we perform 3D data augmentation on the training data. Experiments show that our proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as our implementation runs at over 215 fps on a standard computer with a single GPU.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"5679-5688"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91343156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic components across the two domains via non-negative sparse matrix factorization, while enforcing the representation vectors of the images in this common component-based space to be discriminatively aligned with the attribute-based label representation vectors. To fully exploit the aligned semantic information contained in the learned representation vectors of the instances, we develop a label propagation based testing procedure to classify the unlabeled instances from the unseen classes in the target domain. We conduct experiments on four standard zero-shot learning image datasets, by comparing the proposed approach to the state-of-the-art zero-shot learning methods. The empirical results demonstrate the efficacy of the proposed approach.
{"title":"Zero-Shot Classification with Discriminative Semantic Representation Learning","authors":"Meng Ye, Yuhong Guo","doi":"10.1109/CVPR.2017.542","DOIUrl":"https://doi.org/10.1109/CVPR.2017.542","url":null,"abstract":"Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic components across the two domains via non-negative sparse matrix factorization, while enforcing the representation vectors of the images in this common component-based space to be discriminatively aligned with the attribute-based label representation vectors. To fully exploit the aligned semantic information contained in the learned representation vectors of the instances, we develop a label propagation based testing procedure to classify the unlabeled instances from the unseen classes in the target domain. We conduct experiments on four standard zero-shot learning image datasets, by comparing the proposed approach to the state-of-the-art zero-shot learning methods. The empirical results demonstrate the efficacy of the proposed approach.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 1","pages":"5103-5111"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91031571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Le Zhang, Jagannadan Varadarajan, P. N. Suganthan, N. Ahuja, P. Moulin
Random forest has emerged as a powerful classification technique with promising results in various vision tasks including image classification, pose estimation and object detection. However, current techniques have shown little improvements in visual tracking as they mostly rely on piece wise orthogonal hyperplanes to create decision nodes and lack a robust incremental learning mechanism that is much needed for online tracking. In this paper, we propose a discriminative tracker based on a novel incremental oblique random forest. Unlike conventional orthogonal decision trees that use a single feature and heuristic measures to obtain a split at each node, we propose to use a more powerful proximal SVM to obtain oblique hyperplanes to capture the geometric structure of the data better. The resulting decision surface is not restricted to be axis aligned and hence has the ability to represent and classify the input data better. Furthermore, in order to generalize to online tracking scenarios, we derive incremental update steps that enable the hyperplanes in each node to be updated recursively, efficiently and in a closed-form fashion. We demonstrate the effectiveness of our method using two large scale benchmark datasets (OTB-51 and OTB-100) and show that our method gives competitive results on several challenging cases by relying on simple HOG features as well as in combination with more sophisticated deep neural network based models. The implementations of the proposed random forest are available at https://github.com/ZhangLeUestc/ Incremental-Oblique-Random-Forest.
{"title":"Robust Visual Tracking Using Oblique Random Forests","authors":"Le Zhang, Jagannadan Varadarajan, P. N. Suganthan, N. Ahuja, P. Moulin","doi":"10.1109/CVPR.2017.617","DOIUrl":"https://doi.org/10.1109/CVPR.2017.617","url":null,"abstract":"Random forest has emerged as a powerful classification technique with promising results in various vision tasks including image classification, pose estimation and object detection. However, current techniques have shown little improvements in visual tracking as they mostly rely on piece wise orthogonal hyperplanes to create decision nodes and lack a robust incremental learning mechanism that is much needed for online tracking. In this paper, we propose a discriminative tracker based on a novel incremental oblique random forest. Unlike conventional orthogonal decision trees that use a single feature and heuristic measures to obtain a split at each node, we propose to use a more powerful proximal SVM to obtain oblique hyperplanes to capture the geometric structure of the data better. The resulting decision surface is not restricted to be axis aligned and hence has the ability to represent and classify the input data better. Furthermore, in order to generalize to online tracking scenarios, we derive incremental update steps that enable the hyperplanes in each node to be updated recursively, efficiently and in a closed-form fashion. We demonstrate the effectiveness of our method using two large scale benchmark datasets (OTB-51 and OTB-100) and show that our method gives competitive results on several challenging cases by relying on simple HOG features as well as in combination with more sophisticated deep neural network based models. The implementations of the proposed random forest are available at https://github.com/ZhangLeUestc/ Incremental-Oblique-Random-Forest.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"5825-5834"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89883445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dafang He, X. Yang, Chen Liang, Zihan Zhou, Alexander Ororbia, Daniel Kifer, C. Lee Giles
Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance (word or line) aware segmentation is designed to further remove false positives and obtain word instances. The proposed algorithm can accurately localize word or text line in arbitrary orientations, including curved text lines which cannot be handled in a lot of other frameworks. Our algorithm achieved state-of-the-art performance in ICDAR 2013 (IC13), ICDAR 2015 (IC15) and CUTE80 and Street View Text (SVT) benchmark datasets.
{"title":"Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild","authors":"Dafang He, X. Yang, Chen Liang, Zihan Zhou, Alexander Ororbia, Daniel Kifer, C. Lee Giles","doi":"10.1109/CVPR.2017.58","DOIUrl":"https://doi.org/10.1109/CVPR.2017.58","url":null,"abstract":"Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance (word or line) aware segmentation is designed to further remove false positives and obtain word instances. The proposed algorithm can accurately localize word or text line in arbitrary orientations, including curved text lines which cannot be handled in a lot of other frameworks. Our algorithm achieved state-of-the-art performance in ICDAR 2013 (IC13), ICDAR 2015 (IC15) and CUTE80 and Street View Text (SVT) benchmark datasets.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"32 1","pages":"474-483"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87442347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}