Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.
{"title":"Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network","authors":"Priyanka Mandikal, R. Venkatesh Babu","doi":"10.1109/WACV.2019.00117","DOIUrl":"https://doi.org/10.1109/WACV.2019.00117","url":null,"abstract":"Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114631883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent research on face detection, which is focused primarily on improving accuracy of detecting smaller faces, attempt to develop new anchor design strategies to facilitate increased overlap between anchor boxes and ground truth faces of smaller sizes. In this work, we approach the problem of small face detection with the motivation of enriching the feature maps using a density map estimation module. This module, inspired by recent crowd counting/density estimation techniques, performs the task of estimating the per pixel density of people/faces present in the image. Output of this module is employed to accentuate the feature maps from the backbone network using a feature enrichment module before being used for detecting smaller faces. The proposed approach can be used to complement recent anchor-design based novel methods to further improve their results. Experiments conducted on different datasets such as WIDER, FDDB and Pascal-Faces demonstrate the effectiveness of the proposed approach.
{"title":"DAFE-FD: Density Aware Feature Enrichment for Face Detection","authors":"Vishwanath A. Sindagi, Vishal M. Patel","doi":"10.1109/WACV.2019.00236","DOIUrl":"https://doi.org/10.1109/WACV.2019.00236","url":null,"abstract":"Recent research on face detection, which is focused primarily on improving accuracy of detecting smaller faces, attempt to develop new anchor design strategies to facilitate increased overlap between anchor boxes and ground truth faces of smaller sizes. In this work, we approach the problem of small face detection with the motivation of enriching the feature maps using a density map estimation module. This module, inspired by recent crowd counting/density estimation techniques, performs the task of estimating the per pixel density of people/faces present in the image. Output of this module is employed to accentuate the feature maps from the backbone network using a feature enrichment module before being used for detecting smaller faces. The proposed approach can be used to complement recent anchor-design based novel methods to further improve their results. Experiments conducted on different datasets such as WIDER, FDDB and Pascal-Faces demonstrate the effectiveness of the proposed approach.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122526248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indu Joshi, A. Anand, Mayank Vatsa, Richa Singh, Sumantra Dutta Roy, P. Kalra
Latent fingerprints recognition is very useful in law enforcement and forensics applications. However, automated matching of latent fingerprints with a gallery of live scan images is very challenging due to several compounding factors such as noisy background, poor ridge structure, and overlapping unstructured noise. In order to efficiently match latent fingerprints, an effective enhancement module is a necessity so that it can facilitate correct minutiae extraction. In this research, we propose a Generative Adversarial Network based latent fingerprint enhancement algorithm to enhance the poor quality ridges and predict the ridge information. Experiments on two publicly available datasets, IIITD-MOLF and IIITD-MSLFD show that the proposed enhancement algorithm improves the fingerprints quality while preserving the ridge structure. It helps the standard feature extraction and matching algorithms to boost latent fingerprints matching performance.
{"title":"Latent Fingerprint Enhancement Using Generative Adversarial Networks","authors":"Indu Joshi, A. Anand, Mayank Vatsa, Richa Singh, Sumantra Dutta Roy, P. Kalra","doi":"10.1109/WACV.2019.00100","DOIUrl":"https://doi.org/10.1109/WACV.2019.00100","url":null,"abstract":"Latent fingerprints recognition is very useful in law enforcement and forensics applications. However, automated matching of latent fingerprints with a gallery of live scan images is very challenging due to several compounding factors such as noisy background, poor ridge structure, and overlapping unstructured noise. In order to efficiently match latent fingerprints, an effective enhancement module is a necessity so that it can facilitate correct minutiae extraction. In this research, we propose a Generative Adversarial Network based latent fingerprint enhancement algorithm to enhance the poor quality ridges and predict the ridge information. Experiments on two publicly available datasets, IIITD-MOLF and IIITD-MSLFD show that the proposed enhancement algorithm improves the fingerprints quality while preserving the ridge structure. It helps the standard feature extraction and matching algorithms to boost latent fingerprints matching performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124268960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Shiri, Xin Yu, F. Porikli, R. Hartley, Piotr Koniusz
Recovering a photorealistic face from an artistic portrait is a challenging task since crucial facial details are often distorted or completely lost in artistic compositions. To handle this loss, we propose an Attribute-guided Face Recovery from Portraits (AFRP) that utilizes a Face Recovery Network (FRN) and a Discriminative Network (DN). FRN consists of an autoencoder with residual block-embedded skip-connections and incorporates facial attribute vectors into the feature maps of input portraits at the bottleneck of the autoencoder. DN has multiple convolutional and fully-connected layers, and its role is to enforce FRN to generate authentic face images with corresponding facial attributes dictated by the input attribute vectors. For the preservation of identities, we impose the recovered and ground-truth faces to share similar visual features. Specifically, DN determines whether the recovered image looks like a real face and checks if the facial attributes extracted from the recovered image are consistent with given attributes. Our method can recover photorealistic identity-preserving faces with desired attributes from unseen stylized portraits, artistic paintings, and hand-drawn sketches. On large-scale synthesized and sketch datasets, we demonstrate that our face recovery method achieves state-of-the-art results.
{"title":"Recovering Faces From Portraits with Auxiliary Facial Attributes","authors":"Fatemeh Shiri, Xin Yu, F. Porikli, R. Hartley, Piotr Koniusz","doi":"10.1109/WACV.2019.00049","DOIUrl":"https://doi.org/10.1109/WACV.2019.00049","url":null,"abstract":"Recovering a photorealistic face from an artistic portrait is a challenging task since crucial facial details are often distorted or completely lost in artistic compositions. To handle this loss, we propose an Attribute-guided Face Recovery from Portraits (AFRP) that utilizes a Face Recovery Network (FRN) and a Discriminative Network (DN). FRN consists of an autoencoder with residual block-embedded skip-connections and incorporates facial attribute vectors into the feature maps of input portraits at the bottleneck of the autoencoder. DN has multiple convolutional and fully-connected layers, and its role is to enforce FRN to generate authentic face images with corresponding facial attributes dictated by the input attribute vectors. For the preservation of identities, we impose the recovered and ground-truth faces to share similar visual features. Specifically, DN determines whether the recovered image looks like a real face and checks if the facial attributes extracted from the recovered image are consistent with given attributes. Our method can recover photorealistic identity-preserving faces with desired attributes from unseen stylized portraits, artistic paintings, and hand-drawn sketches. On large-scale synthesized and sketch datasets, we demonstrate that our face recovery method achieves state-of-the-art results.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122319415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Borgia, Yang Hua, Elyor Kodirov, N. Robertson
Video-based person re-identification deals with the inherent difficulty of matching sequences with different length, unregulated, and incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the inter-sequences pose/viewpoint misalignment is considered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. In particular, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their features vectors into a more discriminative viewpoint-insensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.
{"title":"GAN-Based Pose-Aware Regulation for Video-Based Person Re-Identification","authors":"Alessandro Borgia, Yang Hua, Elyor Kodirov, N. Robertson","doi":"10.1109/WACV.2019.00130","DOIUrl":"https://doi.org/10.1109/WACV.2019.00130","url":null,"abstract":"Video-based person re-identification deals with the inherent difficulty of matching sequences with different length, unregulated, and incomplete target pose/viewpoint structure. Common approaches operate either by reducing the problem to the still images case, facing a significant information loss, or by exploiting inter-sequence temporal dependencies as in Siamese Recurrent Neural Networks or in gait analysis. However, in all cases, the inter-sequences pose/viewpoint misalignment is considered, and the existing spatial approaches are mostly limited to the still images context. To this end, we propose a novel approach that can exploit more effectively the rich video information, by accounting for the role that the changing pose/viewpoint factor plays in the sequences matching process. In particular, our approach consists of two components. The first one attempts to complement the original pose-incomplete information carried by the sequences with synthetic GAN-generated images, and fuse their features vectors into a more discriminative viewpoint-insensitive embedding, namely Weighted Fusion (WF). Another one performs an explicit pose-based alignment of sequence pairs to promote coherent feature matching, namely Weighted-Pose Regulation (WPR). Extensive experiments on two large video-based benchmark datasets show that our approach outperforms considerably existing methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"363 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114011029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The moving object segmentation (MOS) in videos with bad weather, irregular motion of objects, camera jitter, shadow and dynamic background scenarios is still an open problem for computer vision applications. To address these issues, in this paper, we propose an approach named as Foreground Generative Adversarial Network (FgGAN) with the recent concepts of generative adversarial network (GAN) and unpaired training for background estimation and foreground segmentation. To the best of our knowledge, this is the first paper with the concept of GAN-based unpaired learning for MOS. Initially, video-wise background is estimated using GAN-based unpaired learning network (network-I). Then, to extract the motion information related to foreground, motion saliency is estimated using estimated background and current video frame. Further, estimated motion saliency is given as input to the GANbased unpaired learning network (network-II) for foreground segmentation. To examine the effectiveness of proposed FgGAN (cascaded networks I and II), the challenging video categories like dynamic background, bad weather, intermittent object motion and shadow are collected from ChangeDetection.net-2014 [26] database. The segmentation accuracy is observed qualitatively and quantitatively in terms of F-measure and percentage of wrong classification (PWC) and compared with the existing state-of-the-art methods. From experimental results, it is evident that the proposed FgGAN shows significant improvement in terms of F-measure and PWC as compared to the existing stateof-the-art methods for MOS.
{"title":"FgGAN: A Cascaded Unpaired Learning for Background Estimation and Foreground Segmentation","authors":"Prashant W. Patil, S. Murala","doi":"10.1109/WACV.2019.00193","DOIUrl":"https://doi.org/10.1109/WACV.2019.00193","url":null,"abstract":"The moving object segmentation (MOS) in videos with bad weather, irregular motion of objects, camera jitter, shadow and dynamic background scenarios is still an open problem for computer vision applications. To address these issues, in this paper, we propose an approach named as Foreground Generative Adversarial Network (FgGAN) with the recent concepts of generative adversarial network (GAN) and unpaired training for background estimation and foreground segmentation. To the best of our knowledge, this is the first paper with the concept of GAN-based unpaired learning for MOS. Initially, video-wise background is estimated using GAN-based unpaired learning network (network-I). Then, to extract the motion information related to foreground, motion saliency is estimated using estimated background and current video frame. Further, estimated motion saliency is given as input to the GANbased unpaired learning network (network-II) for foreground segmentation. To examine the effectiveness of proposed FgGAN (cascaded networks I and II), the challenging video categories like dynamic background, bad weather, intermittent object motion and shadow are collected from ChangeDetection.net-2014 [26] database. The segmentation accuracy is observed qualitatively and quantitatively in terms of F-measure and percentage of wrong classification (PWC) and compared with the existing state-of-the-art methods. From experimental results, it is evident that the proposed FgGAN shows significant improvement in terms of F-measure and PWC as compared to the existing stateof-the-art methods for MOS.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128451602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mina Rezaei, Haojin Yang, Konstantin Harmuth, C. Meinel
We propose a new generative adversarial architecture to mitigate imbalance data problem in medical image semantic segmentation where the majority of pixels belongs to a healthy region and few belong to lesion or non-health region. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low sensitivity. We propose a new conditional generative refinement network with three components: a generative, a discriminative, and a refinement networks to mitigate imbalanced data problem through ensemble learning. The generative network learns to the segment at the pixel level by getting feedback from the discriminative network according to the true positive and true negative maps. On the other hand, the refinement network learns to predict the false positive and the false negative masks produced by the generative network that has significant value, especially in medical application. The final semantic segmentation masks are then composed by the output of the three networks. The proposed architecture shows state-of-the-art results on LiTS-2017 for simultaneous liver and lesion segmentation, and MDA231 for microscopic cell segmentation. We have achieved competitive results on BraTS-2017 for brain tumor segmentation.
{"title":"Conditional Generative Adversarial Refinement Networks for Unbalanced Medical Image Semantic Segmentation","authors":"Mina Rezaei, Haojin Yang, Konstantin Harmuth, C. Meinel","doi":"10.1109/WACV.2019.00200","DOIUrl":"https://doi.org/10.1109/WACV.2019.00200","url":null,"abstract":"We propose a new generative adversarial architecture to mitigate imbalance data problem in medical image semantic segmentation where the majority of pixels belongs to a healthy region and few belong to lesion or non-health region. A model trained with imbalanced data tends to bias towards healthy data which is not desired in clinical applications and predicted outputs by these networks have high precision and low sensitivity. We propose a new conditional generative refinement network with three components: a generative, a discriminative, and a refinement networks to mitigate imbalanced data problem through ensemble learning. The generative network learns to the segment at the pixel level by getting feedback from the discriminative network according to the true positive and true negative maps. On the other hand, the refinement network learns to predict the false positive and the false negative masks produced by the generative network that has significant value, especially in medical application. The final semantic segmentation masks are then composed by the output of the three networks. The proposed architecture shows state-of-the-art results on LiTS-2017 for simultaneous liver and lesion segmentation, and MDA231 for microscopic cell segmentation. We have achieved competitive results on BraTS-2017 for brain tumor segmentation.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128452348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingchao Liu, Ce Li, Hongren Wang, Xiantong Zhen, Baochang Zhang, Qixiang Ye
Correlation filter (CF) trackers have achieved outstanding performance in visual object tracking tasks, in which the cosine mask plays an essential role in alleviating boundary effects caused by the circular assumption. However, the cosine mask imposes a larger weight on its center position, which greatly affects CF trackers, that is, their performance will drop significantly if a bad starting point happens to occur. To address the above issue, we propose a target adaptive image signature (TaiS) model to refine the starting point in each frame for CF trackers. Specifically, we incorporate the target priori into the image signature to build a target-specific saliency map, and iteratively refine the starting point with a closed-form solution during the tracking process. As a result, our TaiS is able to find a better starting point close to the center of targets; more importantly, it is independent of specific CF trackers and can efficiently improve their performance. Experiments on two benchmark datasets, i.e., OTB100 and UAV123, demonstrate that our TaiS consistently achieves high performance and updates the state of the arts in visual tracking. The source code of our approach will be made publicly available.
{"title":"Starts Better and Ends Better: A Target Adaptive Image Signature Tracker","authors":"Xingchao Liu, Ce Li, Hongren Wang, Xiantong Zhen, Baochang Zhang, Qixiang Ye","doi":"10.1109/WACV.2019.00024","DOIUrl":"https://doi.org/10.1109/WACV.2019.00024","url":null,"abstract":"Correlation filter (CF) trackers have achieved outstanding performance in visual object tracking tasks, in which the cosine mask plays an essential role in alleviating boundary effects caused by the circular assumption. However, the cosine mask imposes a larger weight on its center position, which greatly affects CF trackers, that is, their performance will drop significantly if a bad starting point happens to occur. To address the above issue, we propose a target adaptive image signature (TaiS) model to refine the starting point in each frame for CF trackers. Specifically, we incorporate the target priori into the image signature to build a target-specific saliency map, and iteratively refine the starting point with a closed-form solution during the tracking process. As a result, our TaiS is able to find a better starting point close to the center of targets; more importantly, it is independent of specific CF trackers and can efficiently improve their performance. Experiments on two benchmark datasets, i.e., OTB100 and UAV123, demonstrate that our TaiS consistently achieves high performance and updates the state of the arts in visual tracking. The source code of our approach will be made publicly available.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115988498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Removing unwanted shadows is a common need in photo editing software. Previous methods handle some shadows well but perform poorly in cases with severe degradation (darker shadowing) because they rely on directly restoring the degraded data in the shadowed region. Image-completion algorithms can completely replace severely degraded shadowed regions, and perform well with smaller-scale textures, but often fail to reproduce larger-scale macrostructure that may still be visible in the shadowed region. This paper provides a general framework that leverages degraded (in this case shadowed) data in a region to guide image completion by extending the objective function commonly used in current state-of-the-art energy-minimization methods for image completion to include not only visual realism but consistency with the original degraded content. This approach achieves realistic-looking shadow removal even in cases of severe degradation where precise recovery of the unshadowed content may not be possible. Although not demonstrated here, the generality of the approach potentially allows it to be extended to other types of localized degradation.
{"title":"Shadow Patching: Guided Image Completion for Shadow Removal","authors":"Ryan S. Hintze, B. Morse","doi":"10.1109/WACV.2019.00217","DOIUrl":"https://doi.org/10.1109/WACV.2019.00217","url":null,"abstract":"Removing unwanted shadows is a common need in photo editing software. Previous methods handle some shadows well but perform poorly in cases with severe degradation (darker shadowing) because they rely on directly restoring the degraded data in the shadowed region. Image-completion algorithms can completely replace severely degraded shadowed regions, and perform well with smaller-scale textures, but often fail to reproduce larger-scale macrostructure that may still be visible in the shadowed region. This paper provides a general framework that leverages degraded (in this case shadowed) data in a region to guide image completion by extending the objective function commonly used in current state-of-the-art energy-minimization methods for image completion to include not only visual realism but consistency with the original degraded content. This approach achieves realistic-looking shadow removal even in cases of severe degradation where precise recovery of the unshadowed content may not be possible. Although not demonstrated here, the generality of the approach potentially allows it to be extended to other types of localized degradation.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114786823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jangwon Lee, Bardia Doosti, Yupeng Gu, David Cartledge, David J. Crandall, C. Raphael
We present a first step towards developing an interactive piano tutoring system that can observe a student playing the piano and give feedback about hand movements and musical accuracy. In particular, we have two primary aims: 1) to determine which notes on a piano are being played at any moment in time, 2) to identify which finger is pressing each note. We introduce a novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses. We formulate our two problems in terms of multi-task learning and extend a state-of-the-art object detection model to incorporate both audio and visual features. In addition, we introduce a novel finger identification solution based on pressed piano note information. We experimentally confirm that our approach is able to detect pressed piano keys and the piano player's fingers with a high accuracy.
{"title":"Observing Pianist Accuracy and Form with Computer Vision","authors":"Jangwon Lee, Bardia Doosti, Yupeng Gu, David Cartledge, David J. Crandall, C. Raphael","doi":"10.1109/WACV.2019.00165","DOIUrl":"https://doi.org/10.1109/WACV.2019.00165","url":null,"abstract":"We present a first step towards developing an interactive piano tutoring system that can observe a student playing the piano and give feedback about hand movements and musical accuracy. In particular, we have two primary aims: 1) to determine which notes on a piano are being played at any moment in time, 2) to identify which finger is pressing each note. We introduce a novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses. We formulate our two problems in terms of multi-task learning and extend a state-of-the-art object detection model to incorporate both audio and visual features. In addition, we introduce a novel finger identification solution based on pressed piano note information. We experimentally confirm that our approach is able to detect pressed piano keys and the piano player's fingers with a high accuracy.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127339597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}