Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, D. Doermann
The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability.
{"title":"Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation","authors":"Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, D. Doermann","doi":"10.1109/CVPR.2019.00280","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00280","url":null,"abstract":"The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"14 1","pages":"2686-2694"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89522932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, N. Sebe, Jian Yang
In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well as within a task. Thus, we can conduct two types of propagations, cross-task propagation and task-specific propagation, to adaptively diffuse those similar patterns. The former integrates cross-task affinity patterns to adapt to each task therein through the calculation on non-local relationships. Next the latter performs an iterative diffusion in the feature space so that the cross-task affinity patterns can be widely-spread within the task. Accordingly, the learning of each task can be regularized and boosted by the complementary task-level affinities. Extensive experiments demonstrate the effectiveness and the superiority of our method on the joint three tasks. Meanwhile, we achieve the state-of-the-art or competitive results on the three related datasets, NYUD-v2, SUN-RGBD and KITTI.
{"title":"Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation","authors":"Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, N. Sebe, Jian Yang","doi":"10.1109/CVPR.2019.00423","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00423","url":null,"abstract":"In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well as within a task. Thus, we can conduct two types of propagations, cross-task propagation and task-specific propagation, to adaptively diffuse those similar patterns. The former integrates cross-task affinity patterns to adapt to each task therein through the calculation on non-local relationships. Next the latter performs an iterative diffusion in the feature space so that the cross-task affinity patterns can be widely-spread within the task. Accordingly, the learning of each task can be regularized and boosted by the complementary task-level affinities. Extensive experiments demonstrate the effectiveness and the superiority of our method on the joint three tasks. Meanwhile, we achieve the state-of-the-art or competitive results on the three related datasets, NYUD-v2, SUN-RGBD and KITTI.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"63 6 1","pages":"4101-4110"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76694877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Y. Qiao
Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC 2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.
{"title":"Adaptive Pyramid Context Network for Semantic Segmentation","authors":"Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Y. Qiao","doi":"10.1109/CVPR.2019.00770","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00770","url":null,"abstract":"Recent studies witnessed that context features can significantly improve the performance of deep semantic segmentation networks. Current context based segmentation methods differ with each other in how to construct context features and perform differently in practice. This paper firstly introduces three desirable properties of context features in segmentation task. Specially, we find that Global-guided Local Affinity (GLA) can play a vital role in constructing effective context features, while this property has been largely ignored in previous works. Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region, and then calculates a context vector with these affinities. We empirically evaluate our APCNet on three semantic segmentation and scene parsing datasets, including PASCAL VOC 2012, Pascal-Context, and ADE20K dataset. Experimental results show that APCNet achieves state-of-the-art performance on all three benchmarks, and obtains a new record 84.2% on PASCAL VOC 2012 test set without MS COCO pre-trained and any post-processing.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"7511-7520"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75383395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaofan Qiu, Ting Yao, C. Ngo, Xinmei Tian, Tao Mei
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported.
{"title":"Learning Spatio-Temporal Representation With Local and Global Diffusion","authors":"Zhaofan Qiu, Ting Yao, C. Ngo, Xinmei Tian, Tao Mei","doi":"10.1109/CVPR.2019.01233","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01233","url":null,"abstract":"Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"66 1","pages":"12048-12057"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77967732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao
Medical image analysis has two important research areas: disease grading and fine-grained lesion segmentation. Although the former problem often relies on the latter, the two are usually studied separately. Disease severity grading can be treated as a classification problem, which only requires image-level annotations, while the lesion segmentation requires stronger pixel-level annotations. However, pixel-wise data annotation for medical images is highly time-consuming and requires domain experts. In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism. Given a small set of pixel-level annotated data, a multi-lesion mask generation model first performs the traditional semantic segmentation task. Then, based on initially predicted lesion maps for large quantities of image-level annotated data, a lesion attentive disease grading model is designed to improve the severity classification accuracy. Meanwhile, the lesion attention model can refine the lesion maps using class-specific information to fine-tune the segmentation model in a semi-supervised manner. An adversarial architecture is also integrated for training. With extensive experiments on a representative medical problem called diabetic retinopathy (DR), we validate the effectiveness of our method and achieve consistent improvements over state-of-the-art methods on three public datasets.
{"title":"Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images","authors":"Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao","doi":"10.1109/CVPR.2019.00218","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00218","url":null,"abstract":"Medical image analysis has two important research areas: disease grading and fine-grained lesion segmentation. Although the former problem often relies on the latter, the two are usually studied separately. Disease severity grading can be treated as a classification problem, which only requires image-level annotations, while the lesion segmentation requires stronger pixel-level annotations. However, pixel-wise data annotation for medical images is highly time-consuming and requires domain experts. In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism. Given a small set of pixel-level annotated data, a multi-lesion mask generation model first performs the traditional semantic segmentation task. Then, based on initially predicted lesion maps for large quantities of image-level annotated data, a lesion attentive disease grading model is designed to improve the severity classification accuracy. Meanwhile, the lesion attention model can refine the lesion maps using class-specific information to fine-tune the segmentation model in a semi-supervised manner. An adversarial architecture is also integrated for training. With extensive experiments on a representative medical problem called diabetic retinopathy (DR), we validate the effectiveness of our method and achieve consistent improvements over state-of-the-art methods on three public datasets.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"14 1","pages":"2074-2083"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74890530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a coordinate-free description of Carlsson-Weinshall duality between scene points and camera pinholes and use it to derive a new characterization of primal/dual multi-view geometry. In the case of three views, a particular set of reduced trilinearities provide a novel parameterization of camera geometry that, unlike existing ones, is subject only to very simple internal constraints. These trilinearities lead to new "quasi-linear" algorithms for primal and dual structure from motion. We include some preliminary experiments with real and synthetic data.
{"title":"Coordinate-Free Carlsson-Weinshall Duality and Relative Multi-View Geometry","authors":"Matthew Trager, M. Hebert, J. Ponce","doi":"10.1109/CVPR.2019.00031","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00031","url":null,"abstract":"We present a coordinate-free description of Carlsson-Weinshall duality between scene points and camera pinholes and use it to derive a new characterization of primal/dual multi-view geometry. In the case of three views, a particular set of reduced trilinearities provide a novel parameterization of camera geometry that, unlike existing ones, is subject only to very simple internal constraints. These trilinearities lead to new \"quasi-linear\" algorithms for primal and dual structure from motion. We include some preliminary experiments with real and synthetic data.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"225-233"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75594527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guosen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao
Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.
{"title":"Attentive Region Embedding Network for Zero-Shot Learning","authors":"Guosen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao","doi":"10.1109/CVPR.2019.00961","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00961","url":null,"abstract":"Zero-shot learning (ZSL) aims to classify images from unseen categories, by merely utilizing seen class images as the training data. Existing works on ZSL mainly leverage the global features or learn the global regions, from which, to construct the embeddings to the semantic space. However, few of them study the discrimination power implied in local image regions (parts), which, in some sense, correspond to semantic attributes, have stronger discrimination than attributes, and can thus assist the semantic transfer between seen/unseen classes. In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. Specifically, AREN is end-to-end trainable and consists of two network branches, i.e., the attentive region embedding (ARE) stream, and the attentive compressed second-order embedding (ACSE) stream. ARE is capable of discovering multiple part regions under the guidance of the attention and the compatibility loss. Moreover, a novel adaptive thresholding mechanism is proposed for suppressing redundant (such as background) attention regions. To further guarantee more stable semantic transfer from the perspective of second-order collaboration, ACSE is incorporated into the AREN. In the comprehensive evaluations on four benchmarks, our models achieve state-of-the-art performances under ZSL setting, and compelling results under generalized ZSL setting.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"9376-9385"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75735818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate microcalcification (μC) detection is of great importance due to its high proportion in early breast cancers. Most of the previous μC detection methods belong to discriminative models, where classifiers are exploited to distinguish μCs from other backgrounds. However, it is still challenging for these methods to tell the μCs from amounts of normal tissues because they are too tiny (at most 14 pixels). Generative methods can precisely model the normal tissues and regard the abnormal ones as outliers, while they fail to further distinguish the μCs from other anomalies, i.e. vessel calcifications. In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. Firstly, a generative model named Anomaly Separation Network (ASN) is used to generate candidate μCs. ASN contains two major components. A deep convolutional encoder-decoder network is built to learn the image reconstruction mapping and a t-test loss function is designed to separate the distributions of the reconstruction residuals of μCs from normal tissues. Secondly, a discriminative model is cascaded to tell the μCs from the false positives. Finally, to verify the effectiveness of our method, we conduct experiments on both public and in-house datasets, which demonstrates that our approach outperforms previous state-of-the-art methods.
{"title":"Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms","authors":"Fandong Zhang, Ling Luo, Xinwei Sun, Zhen Zhou, Xiuli Li, Yizhou Yu, Yizhou Wang","doi":"10.1109/CVPR.2019.01286","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01286","url":null,"abstract":"Accurate microcalcification (μC) detection is of great importance due to its high proportion in early breast cancers. Most of the previous μC detection methods belong to discriminative models, where classifiers are exploited to distinguish μCs from other backgrounds. However, it is still challenging for these methods to tell the μCs from amounts of normal tissues because they are too tiny (at most 14 pixels). Generative methods can precisely model the normal tissues and regard the abnormal ones as outliers, while they fail to further distinguish the μCs from other anomalies, i.e. vessel calcifications. In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. Firstly, a generative model named Anomaly Separation Network (ASN) is used to generate candidate μCs. ASN contains two major components. A deep convolutional encoder-decoder network is built to learn the image reconstruction mapping and a t-test loss function is designed to separate the distributions of the reconstruction residuals of μCs from normal tissues. Secondly, a discriminative model is cascaded to tell the μCs from the false positives. Finally, to verify the effectiveness of our method, we conduct experiments on both public and in-house datasets, which demonstrates that our approach outperforms previous state-of-the-art methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"2018 1","pages":"12570-12578"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74288132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xionghui Wang, Jianfang Hu, J. Lai, Jianguo Zhang, Weishi Zheng
The goal of early action prediction is to recognize actions from partially observed videos with incomplete action executions, which is quite different from action recognition. Predicting early actions is very challenging since the partially observed videos do not contain enough action information for recognition. In this paper, we aim at improving early action prediction by proposing a novel teacher-student learning framework. Our framework involves a teacher model for recognizing actions from full videos, a student model for predicting early actions from partial videos, and a teacher-student learning block for distilling progressive knowledge from teacher to student, crossing different tasks. Extensive experiments on three public action datasets show that the proposed progressive teacher-student learning framework can consistently improve performance of early action prediction model. We have also reported the state-of-the-art performances for early action prediction on all of these sets.
{"title":"Progressive Teacher-Student Learning for Early Action Prediction","authors":"Xionghui Wang, Jianfang Hu, J. Lai, Jianguo Zhang, Weishi Zheng","doi":"10.1109/CVPR.2019.00367","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00367","url":null,"abstract":"The goal of early action prediction is to recognize actions from partially observed videos with incomplete action executions, which is quite different from action recognition. Predicting early actions is very challenging since the partially observed videos do not contain enough action information for recognition. In this paper, we aim at improving early action prediction by proposing a novel teacher-student learning framework. Our framework involves a teacher model for recognizing actions from full videos, a student model for predicting early actions from partial videos, and a teacher-student learning block for distilling progressive knowledge from teacher to student, crossing different tasks. Extensive experiments on three public action datasets show that the proposed progressive teacher-student learning framework can consistently improve performance of early action prediction model. We have also reported the state-of-the-art performances for early action prediction on all of these sets.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"3551-3560"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72775209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Handwritten signature verification is an important technique for many financial, commercial, and forensic applications. In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. The IDN model contains four weight-shared neural network streams, of which two receiving the original signature images are the discriminative streams and the other two addressing the gray-inverted images form the inverse streams. Multiple paths of attention modules connect the discriminative streams and the inverse streams to propagate messages. With the inverse streams and the multi-path attention modules, the IDN model intensifies the effective information of signature verification. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures. We test our method on the Chinese signature dataset and other three signature datasets of different languages: CEDAR, BHSig-B, and BHSig-H. Experiments prove the strength and potential of our method.
{"title":"Inverse Discriminative Networks for Handwritten Signature Verification","authors":"Ping Wei, Huan Li, Ping Hu","doi":"10.1109/CVPR.2019.00591","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00591","url":null,"abstract":"Handwritten signature verification is an important technique for many financial, commercial, and forensic applications. In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. The IDN model contains four weight-shared neural network streams, of which two receiving the original signature images are the discriminative streams and the other two addressing the gray-inverted images form the inverse streams. Multiple paths of attention modules connect the discriminative streams and the inverse streams to propagate messages. With the inverse streams and the multi-path attention modules, the IDN model intensifies the effective information of signature verification. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures. We test our method on the Chinese signature dataset and other three signature datasets of different languages: CEDAR, BHSig-B, and BHSig-H. Experiments prove the strength and potential of our method.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"46 1","pages":"5757-5765"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72791460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}