Rei Endo, Yoshihiko Kawai, H. Sumiyoshi, Masanori Sano
Scene-text detection in natural-scene images is an important technique because scene texts contain location information such as names of places and buildings, but many difficulties still remain regarding practical use. In this paper, we tackle two problems of scene-text detection. The first is the discontiguous component problem in specific languages that contain characters consisting of discontiguous components. The second is the multi-orientation problem in all languages. To solve these two problems, we propose a connected-component-based scene-text-detection method. Our proposed method involves our novel neighbor-character search method using a synthesizable descriptor for the discontiguous-component problems and our novel region descriptor called the rotated bounding box descriptors (RBBs) for rotated characters. We also evaluated our proposed scene-text-detection method by using the well-known MSRA-TD500 dataset that includes rotated characters with discontiguous components.
{"title":"Scene-Text-Detection Method Robust Against Orientation and Discontiguous Components of Characters","authors":"Rei Endo, Yoshihiko Kawai, H. Sumiyoshi, Masanori Sano","doi":"10.1109/CVPRW.2017.130","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.130","url":null,"abstract":"Scene-text detection in natural-scene images is an important technique because scene texts contain location information such as names of places and buildings, but many difficulties still remain regarding practical use. In this paper, we tackle two problems of scene-text detection. The first is the discontiguous component problem in specific languages that contain characters consisting of discontiguous components. The second is the multi-orientation problem in all languages. To solve these two problems, we propose a connected-component-based scene-text-detection method. Our proposed method involves our novel neighbor-character search method using a synthesizable descriptor for the discontiguous-component problems and our novel region descriptor called the rotated bounding box descriptors (RBBs) for rotated characters. We also evaluated our proposed scene-text-detection method by using the well-known MSRA-TD500 dataset that includes rotated characters with discontiguous components.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"61 1","pages":"941-949"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74711076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature extraction layer which helps in reducing the effect of noise on feature maps, an ℓp regularizer which helps in selecting only the appropriate feature maps for the task of reconstruction, and finally a three step training approach which leverages adversarial training to give the final performance boost to the model. The proposed model shows competitive denoising performance when compared to the state-of-the-art approaches.
{"title":"Image Denoising via CNNs: An Adversarial Approach","authors":"Nithish Divakar, R. Venkatesh Babu","doi":"10.1109/CVPRW.2017.145","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.145","url":null,"abstract":"Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature extraction layer which helps in reducing the effect of noise on feature maps, an ℓp regularizer which helps in selecting only the appropriate feature maps for the task of reconstruction, and finally a three step training approach which leverages adversarial training to give the final performance boost to the model. The proposed model shows competitive denoising performance when compared to the state-of-the-art approaches.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"153 1","pages":"1076-1083"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73166165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The evolution of the video surveillance systems generates questions concerning protection of individual privacy. In this paper, we design ASePPI, an Adaptive Scrambling enabling Privacy Protection and Intelligibility method operating in the H.264/AVC stream with the aim to be robust against de-anonymization attacks targeting the restoration of the original image and the re-identification of people. The proposed approach automatically adapts the level of protection according to the resolution of the region of interest. Compared to existing methods, our framework provides a better trade-off between the privacy protection and the visibility of the scene with robustness against de-anonymization attacks. Moreover, the impact on the source coding stream is negligible.
{"title":"ASePPI: Robust Privacy Protection Against De-Anonymization Attacks","authors":"Natacha Ruchaud, J. Dugelay","doi":"10.1109/CVPRW.2017.177","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.177","url":null,"abstract":"The evolution of the video surveillance systems generates questions concerning protection of individual privacy. In this paper, we design ASePPI, an Adaptive Scrambling enabling Privacy Protection and Intelligibility method operating in the H.264/AVC stream with the aim to be robust against de-anonymization attacks targeting the restoration of the original image and the re-identification of people. The proposed approach automatically adapts the level of protection according to the resolution of the region of interest. Compared to existing methods, our framework provides a better trade-off between the privacy protection and the visibility of the scene with robustness against de-anonymization attacks. Moreover, the impact on the source coding stream is negligible.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"22 1","pages":"1352-1359"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74447913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Zhao, Jianshu Li, Xuecheng Nie, F. Zhao, Yunpeng Chen, Zhecan Wang, Jiashi Feng, Shuicheng Yan
In this paper, we present a Self-Supervised Neural Aggregation Network (SS-NAN) for human parsing. SS-NAN adaptively learns to aggregate the multi-scale features at each pixel "address". In order to further improve the feature discriminative capacity, a self-supervised joint loss is adopted as an auxiliary learning strategy, which imposes human joint structures into parsing results without resorting to extra supervision. The proposed SS-NAN is end-to-end trainable. SS-NAN can be integrated into any advanced neural networks to help aggregate features regarding the importance at different positions and scales and incorporate rich high-level knowledge regarding human joint structures from a global perspective, which in turn improve the parsing results. Comprehensive evaluations on the recent Look into Person (LIP) and the PASCAL-Person-Part benchmark datasets demonstrate the significant superiority of our method over other state-of-the-arts.
{"title":"Self-Supervised Neural Aggregation Networks for Human Parsing","authors":"Jian Zhao, Jianshu Li, Xuecheng Nie, F. Zhao, Yunpeng Chen, Zhecan Wang, Jiashi Feng, Shuicheng Yan","doi":"10.1109/CVPRW.2017.204","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.204","url":null,"abstract":"In this paper, we present a Self-Supervised Neural Aggregation Network (SS-NAN) for human parsing. SS-NAN adaptively learns to aggregate the multi-scale features at each pixel \"address\". In order to further improve the feature discriminative capacity, a self-supervised joint loss is adopted as an auxiliary learning strategy, which imposes human joint structures into parsing results without resorting to extra supervision. The proposed SS-NAN is end-to-end trainable. SS-NAN can be integrated into any advanced neural networks to help aggregate features regarding the importance at different positions and scales and incorporate rich high-level knowledge regarding human joint structures from a global perspective, which in turn improve the parsing results. Comprehensive evaluations on the recent Look into Person (LIP) and the PASCAL-Person-Part benchmark datasets demonstrate the significant superiority of our method over other state-of-the-arts.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"2012 1","pages":"1595-1603"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73689989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Catherine M. Nicka, M. Suriawinata, L. Torresani, A. Suriawinata, S. Hassanpour
Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. The process of characterization is time-intensive and requires years of specialized medical training. In this work, we propose a deep-learning-based image analysis approach that not only can accurately classify different types of polyps in whole-slide images, but also generates major regions and features on the slide through a model visualization approach. We argue that this visualization approach will make sense of the underlying reasons for the classification outcomes, significantly reduce the cognitive burden on clinicians, and improve the diagnostic accuracy for whole-slide image characterization tasks. Our results show the efficacy of this network visualization approach in recovering decisive regions and features for different types of polyps on whole-slide images according to the domain expert pathologists.
{"title":"Looking Under the Hood: Deep Neural Network Visualization to Interpret Whole-Slide Image Analysis Outcomes for Colorectal Polyps","authors":"Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Catherine M. Nicka, M. Suriawinata, L. Torresani, A. Suriawinata, S. Hassanpour","doi":"10.1109/CVPRW.2017.114","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.114","url":null,"abstract":"Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. The process of characterization is time-intensive and requires years of specialized medical training. In this work, we propose a deep-learning-based image analysis approach that not only can accurately classify different types of polyps in whole-slide images, but also generates major regions and features on the slide through a model visualization approach. We argue that this visualization approach will make sense of the underlying reasons for the classification outcomes, significantly reduce the cognitive burden on clinicians, and improve the diagnostic accuracy for whole-slide image characterization tasks. Our results show the efficacy of this network visualization approach in recovering decisive regions and features for different types of polyps on whole-slide images according to the domain expert pathologists.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"17 30 1","pages":"821-827"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85032966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nisarg Raval, Ashwin Machanavajjhala, Landon P. Cox
Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, we build on the existing work of adversarial learning to design a perturbation mechanism that jointly optimizes privacy and utility objectives. We provide a feasibility study of the proposed mechanism and present ideas on developing a privacy framework based on the adversarial perturbation mechanism.
{"title":"Protecting Visual Secrets Using Adversarial Nets","authors":"Nisarg Raval, Ashwin Machanavajjhala, Landon P. Cox","doi":"10.1109/CVPRW.2017.174","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.174","url":null,"abstract":"Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, we build on the existing work of adversarial learning to design a perturbation mechanism that jointly optimizes privacy and utility objectives. We provide a feasibility study of the proposed mechanism and present ideas on developing a privacy framework based on the adversarial perturbation mechanism.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"10 1","pages":"1329-1332"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84188206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a component of an Augmented Reality (AR) based system focused on supporting workers in manufacturing and maintenance industry. Particularly, it describes a component responsible for verification of performed steps. Correct handling is crucial in both manufacturing and maintenance industries and deviations may cause problems in later stages of the production and assembly. The primary aim of such support systems is making the training of new employees faster and more efficient and reducing the error rate. We present a method for automatically recognizing an object's state with the objective of verifying a set of tasks performed by a user. The novelty of our approach is that the system can automatically recognize the state of the object and provide immediate feedback to the operator using an AR visualization enabling fully automatic step-by-step instructions.
{"title":"Object State Recognition for Automatic AR-Based Maintenance Guidance","authors":"P. Dvorák, Radovan Josth, Elisabetta Delponte","doi":"10.1109/CVPRW.2017.164","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.164","url":null,"abstract":"This paper describes a component of an Augmented Reality (AR) based system focused on supporting workers in manufacturing and maintenance industry. Particularly, it describes a component responsible for verification of performed steps. Correct handling is crucial in both manufacturing and maintenance industries and deviations may cause problems in later stages of the production and assembly. The primary aim of such support systems is making the training of new employees faster and more efficient and reducing the error rate. We present a method for automatically recognizing an object's state with the objective of verifying a set of tasks performed by a user. The novelty of our approach is that the system can automatically recognize the state of the object and provide immediate feedback to the operator using an AR visualization enabling fully automatic step-by-step instructions.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"6 2 1","pages":"1244-1250"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84940027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work we analyze the complex trade-off between data transfer, computation time, and power consumption when a multi-stage data-intensive algorithm (in this case video stabilization) is split between a low power mobile device and high power cloud server. We evaluate design choices in terms of which intermediate representations should be transferred to the server and back to the mobile device, and present a graph-based solution that can update the optimal joint mobile-cloud computation separation as the hardware configuration or user's requirements change. The practices we employ in this work can be extended to other mobile computer vision applications.
{"title":"Joint Mobile-Cloud Video Stabilization","authors":"G. S. Adesoye, Oliver Wang","doi":"10.1109/CVPRW.2017.49","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.49","url":null,"abstract":"In this work we analyze the complex trade-off between data transfer, computation time, and power consumption when a multi-stage data-intensive algorithm (in this case video stabilization) is split between a low power mobile device and high power cloud server. We evaluate design choices in terms of which intermediate representations should be transferred to the server and back to the mobile device, and present a graph-based solution that can update the optimal joint mobile-cloud computation separation as the hardware configuration or user's requirements change. The practices we employ in this work can be extended to other mobile computer vision applications.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"61 1","pages":"353-360"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85490683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.
{"title":"Temporal Domain Neural Encoder for Video Representation Learning","authors":"Hao Hu, Zhaowen Wang, Joon-Young Lee, Zhe L. Lin, Guo-Jun Qi","doi":"10.1109/CVPRW.2017.272","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.272","url":null,"abstract":"We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"7 1","pages":"2192-2199"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83789830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative Adversarial Networks (GANs) have been shown to produce synthetic face images of compelling realism. In this work, we present a conditional GAN approach to generate contextually valid facial expressions in dyadic human interactions. In contrast to previous work employing conditions related to facial attributes of generated identities, we focused on dyads in an attempt to model the relationship and influence of one person’s facial expressions in the reaction of the other. To this end, we introduced a two level optimization of GANs in interviewerinterviewee dyadic interactions. In the first stage we generate face sketches of the interviewer conditioned on facial expressions of the interviewee. The second stage synthesizes complete face images conditioned on the face sketches generated in the first stage. We demonstrated that our model is effective at generating visually compelling face images in dyadic interactions. Moreover we quantitatively showed that the facial expressions depicted in the generated interviewer face images reflect valid emotional reactions to the interviewee behavior.
{"title":"DyadGAN: Generating Facial Expressions in Dyadic Interactions","authors":"Yuchi Huang, Saad M. Khan","doi":"10.1109/CVPRW.2017.280","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.280","url":null,"abstract":"Generative Adversarial Networks (GANs) have been shown to produce synthetic face images of compelling realism. In this work, we present a conditional GAN approach to generate contextually valid facial expressions in dyadic human interactions. In contrast to previous work employing conditions related to facial attributes of generated identities, we focused on dyads in an attempt to model the relationship and influence of one person’s facial expressions in the reaction of the other. To this end, we introduced a two level optimization of GANs in interviewerinterviewee dyadic interactions. In the first stage we generate face sketches of the interviewer conditioned on facial expressions of the interviewee. The second stage synthesizes complete face images conditioned on the face sketches generated in the first stage. We demonstrated that our model is effective at generating visually compelling face images in dyadic interactions. Moreover we quantitatively showed that the facial expressions depicted in the generated interviewer face images reflect valid emotional reactions to the interviewee behavior.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 1","pages":"2259-2266"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80287997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}