Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes
Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.
{"title":"Semantic Correspondence in the Wild","authors":"Akila Pemasiri, Kien Nguyen Thanh, S. Sridharan, C. Fookes","doi":"10.1109/WACV.2019.00126","DOIUrl":"https://doi.org/10.1109/WACV.2019.00126","url":null,"abstract":"Semantic correspondence estimation where the object instances depicted are deformed extensively from one instance to the next is a challenging problem in computer vision that has received much attention. Unfortunately, all existing approaches require prior knowledge of the object classes which are present in the image environment. This is an unwanted restriction as it can prevent the establishment of semantic correspondence across object classes in wild conditions when it is uncertain which classes will be of interest. In contrast, in this paper we formulate the semantic correspondence estimation task as a key point detection process in which image-to-class classification and image-to-image correspondence are solved simultaneously. Identifying object classes within the same framework to establish correspondence, increases this approach's applicability in real world scenarios. The use of object regions in the process also enhances the accuracy while constraining the search space, thus improving overall efficiency. This new approach is compared with the state-of-the-art on publicly available datasets to validate its capability for improved semantic correspondence estimation in wild conditions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122637904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle
Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the "what" of the signal - and extracting location related information — the "where" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by "what" information, while the dorsal pathway, into the parietal lobe, is dominated by "where" information. Inspired by this structure, we propose the integration of a "Ventral Network" and a "Dorsal Network", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.
{"title":"Ventral-Dorsal Neural Networks: Object Detection Via Selective Attention","authors":"M. K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson Reesee, Azadeh Moghtaderi, Ming-Hsuan Yang, D. Noelle","doi":"10.1109/WACV.2019.00110","DOIUrl":"https://doi.org/10.1109/WACV.2019.00110","url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) have been repeatedly proven to perform well on image classification tasks. Object detection methods, however, are still in need of significant improvements. In this paper, we propose a new framework called Ventral-Dorsal Networks (VDNets) which is inspired by the structure of the human visual system. Roughly, the visual input signal is analyzed along two separate neural streams, one in the temporal lobe and the other in the parietal lobe. The coarse functional distinction between these streams is between object recognition — the \"what\" of the signal - and extracting location related information — the \"where\" of the signal. The ventral pathway from primary visual cortex, entering the temporal lobe, is dominated by \"what\" information, while the dorsal pathway, into the parietal lobe, is dominated by \"where\" information. Inspired by this structure, we propose the integration of a \"Ventral Network\" and a \"Dorsal Network\", which are complementary. Information about object identity can guide localization, and location information can guide attention to relevant image regions, improving object recognition. This new dual network framework sharpens the focus of object detection. Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches on PASCAL VOC 2007 by 8% (mAP) and PASCAL VOC 2012 by 3% (mAP). Moreover, a comparison of techniques on Yearbook images displays substantial qualitative and quantitative benefits of VDNet.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115023324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.
{"title":"Local Color Mapping Combined with Color Transfer for Underwater Image Enhancement","authors":"R. Protasiuk, Adel Bibi, Bernard Ghanem","doi":"10.1109/WACV.2019.00157","DOIUrl":"https://doi.org/10.1109/WACV.2019.00157","url":null,"abstract":"Color correction and color transfer methods have gained a lot of attention in the past few years to circumvent color degradation that may occur due to various sources. In this paper, we propose a novel simple yet powerful strategy to profoundly enhance color distorted underwater images. The proposed approach combines both local and global information through a simple yet powerful affine transform model. Local and global information are carried through local color mapping and color covariance mapping between an input and some reference source, respectively. Several experiments on degraded underwater images demonstrate that the proposed method performs favourably to all other methods including ones that are tailored to correcting underwater images by explicit noise modelling.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a "signature") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.
{"title":"Zero Shot License Plate Re-Identification","authors":"Mayank Gupta, Abhinav Kumar, S. Madhvanath","doi":"10.1109/WACV.2019.00087","DOIUrl":"https://doi.org/10.1109/WACV.2019.00087","url":null,"abstract":"The problem of person, vehicle or license plate reidentification is generally treated as a multi-shot image retrieval problem. The objective of these tasks is to learn a feature representation of query images (called a \"signature\") and then use these signatures to match against a database of template image signatures with the aid of a distance metric. In this paper, we propose a novel approach for license plate Re-Id inspired by Zero Shot Learning. The core idea is to generate template signatures for retrieval purposes from a multi-hot text encoding of license plates instead of their images. The proposed method maps license plate images and their license plate numbers to a common embedding space using a Symmetric Triplet loss function so that an image can be queried against its text. In effect, our approach makes it possible to identify license plates whose images have never been seen before, using a large text database of license plate numbers. We show that our system is capable of highly accurate and fast re-identification of license plates, and its performance compares favorably to both OCR-based approaches as well as state of the art image-based Re-ID approaches. In addition to the advantages of avoiding manual image labeling and the ease of creating signature databases, the minimal time and storage requirements enable our system to be deployed even on portable devices.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georg Waltner, M. Opitz, Horst Possegger, H. Bischof
When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.
{"title":"HiBsteR: Hierarchical Boosted Deep Metric Learning for Image Retrieval","authors":"Georg Waltner, M. Opitz, Horst Possegger, H. Bischof","doi":"10.1109/WACV.2019.00069","DOIUrl":"https://doi.org/10.1109/WACV.2019.00069","url":null,"abstract":"When the number of categories is growing into thousands, large-scale image retrieval becomes an increasingly hard task. Retrieval accuracy can be improved by learning distance metric methods that separate categories in a transformed embedding space. Unlike most methods that utilize a single embedding to learn a distance metric, we build on the idea of boosted metric learning, where an embedding is split into a boosted ensemble of embeddings. While in general metric learning is directly applied on fine labels to learn embeddings, we take this one step further and incorporate hierarchical label information into the boosting framework and show how to properly adapt loss functions for this purpose. We show that by introducing several sub-embeddings which focus on specific hierarchical classes, the retrieval accuracy can be improved compared to standard flat label embeddings. The proposed method is especially suitable for exploiting hierarchical datasets or when additional labels can be retrieved without much effort. Our approach improves R@1 over state-of-the-art methods on the biggest available retrieval dataset (Stanford Online Products) and sets new reference baselines for hierarchical metric learning on several other datasets (CUB-200-2011, VegFru, FruitVeg-81). We show that the clustering quality in terms of NMI score is superior to previous works.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124992277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett
Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.
{"title":"Automatic Detection and Segmentation of Lentil Crop Breeding Plots From Multi-Spectral Images Captured by UAV-Mounted Camera","authors":"Imran Ahmed, M. Eramian, I. Ovsyannikov, William van der Kamp, K. Nielsen, H. Duddu, Arafia Rumali, S. Shirtliffe, K. Bett","doi":"10.1109/WACV.2019.00183","DOIUrl":"https://doi.org/10.1109/WACV.2019.00183","url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) paired with image detection and segmentation techniques can be used to extract plant phenotype information of individual breeding or research plots. Each plot contains plants of a single genetic line. Breeders are interested in selecting lines with preferred phenotypes (physical traits) that increase crop yield or resilience. Automated detection and segmentation of plots would enable automatic monitoring and quantification of plot phenotypes, allowing a faster selection process that requires much fewer person-hours compared with manual assessment. A detection algorithm based on Laplacian of Gaussian (LoG) blob detection and a segmentation algorithm based on a combination of unsupervised clustering and random walker image segmentation are proposed to detect and segment lentil plots from multi-spectral aerial images. Our algorithm detects and segments lentil plots from normalized difference vegetative index (NDVI) images. The detection algorithm exhibited an average precision and recall of 96.3% and 97.2% respectively. The average Dice similarity coefficient between a detected segmented plot and its ground truth was 0.906.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121515103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.
{"title":"Skeleton-Based Action Recognition of People Handling Objects","authors":"Sunoh Kim, Kimin Yun, Jongyoul Park, J. Choi","doi":"10.1109/WACV.2019.00014","DOIUrl":"https://doi.org/10.1109/WACV.2019.00014","url":null,"abstract":"In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115387490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.
{"title":"Fashion Attributes-to-Image Synthesis Using Attention-Based Generative Adversarial Network","authors":"Hanbit Lee, Sang-goo Lee","doi":"10.1109/WACV.2019.00055","DOIUrl":"https://doi.org/10.1109/WACV.2019.00055","url":null,"abstract":"In this paper, we present a method to generate fashion product images those are consistent with a given set of fashion attributes. Since distinct fashion attributes are related to different local sub-regions of a product image, we propose to use generative adversarial network with attentional discriminator. The attribute-attended loss signal from discriminator leads generator to generate more consistent images with given attributes. In addition, we present a generator based on Product-of-Gaussian to encode the composition of fashion attributes in effective way. To verify the proposed model whether it generates consistent image, an oracle attribute classifier is trained and judge the consistency of given attributes and the generated images. Our model significantly outperforms the baseline model in terms of correctness measured by the pre-trained oracle classifier. We show not only qualitative performance but also synthesized images with various combinations of attributes, so we can compare them with baseline model.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128605221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patch-based denoising algorithms like BM3D have achieved outstanding performance. An important idea for the success of these methods is to exploit the recurrence of similar patches in an input image to estimate the underlying image structures. However, in these algorithms, the similar patches used for denoising are obtained via Nearest Neighbour Search (NNS) and are sometimes not optimal. First, due to the existence of noise, NNS can select similar patches with similar noise patterns to the reference patch. Second, the unreliable noisy pixels in digital images can bring a bias to the patch searching process and result in a loss of color fidelity in the final denoising result. We observe that given a set of good similar patches, their distribution is not necessarily centered at the noisy reference patch and can be approximated by a Gaussian component. Based on this observation, we present a patch searching method that clusters similar patch candidates into patch groups using Gaussian Mixture Model-based clustering, and selects the patch group that contains the reference patch as the final patches for denoising. We also use an unreliable pixel estimation algorithm to pre-process the input noisy images to further improve the patch searching. Our experiments show that our approach can better capture the underlying patch structures and can consistently enable the state-of-the-art patch-based denoising algorithms, such as BM3D, LPCA and PLOW, to better denoise images by providing them with patches found by our approach while without modifying these algorithms.
{"title":"Good Similar Patches for Image Denoising","authors":"Si Lu","doi":"10.1109/WACV.2019.00205","DOIUrl":"https://doi.org/10.1109/WACV.2019.00205","url":null,"abstract":"Patch-based denoising algorithms like BM3D have achieved outstanding performance. An important idea for the success of these methods is to exploit the recurrence of similar patches in an input image to estimate the underlying image structures. However, in these algorithms, the similar patches used for denoising are obtained via Nearest Neighbour Search (NNS) and are sometimes not optimal. First, due to the existence of noise, NNS can select similar patches with similar noise patterns to the reference patch. Second, the unreliable noisy pixels in digital images can bring a bias to the patch searching process and result in a loss of color fidelity in the final denoising result. We observe that given a set of good similar patches, their distribution is not necessarily centered at the noisy reference patch and can be approximated by a Gaussian component. Based on this observation, we present a patch searching method that clusters similar patch candidates into patch groups using Gaussian Mixture Model-based clustering, and selects the patch group that contains the reference patch as the final patches for denoising. We also use an unreliable pixel estimation algorithm to pre-process the input noisy images to further improve the patch searching. Our experiments show that our approach can better capture the underlying patch structures and can consistently enable the state-of-the-art patch-based denoising algorithms, such as BM3D, LPCA and PLOW, to better denoise images by providing them with patches found by our approach while without modifying these algorithms.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131207651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim
Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.
{"title":"CNN-Based Semantic Segmentation Using Level Set Loss","authors":"Youngeun Kim, Seunghyeon Kim, Taekyung Kim, Changick Kim","doi":"10.1109/WACV.2019.00191","DOIUrl":"https://doi.org/10.1109/WACV.2019.00191","url":null,"abstract":"Thesedays, Convolutional Neural Networks are widely used in semantic segmentation. However, since CNN-based segmentation networks produce low-resolution outputs with rich semantic information, it is inevitable that spatial details (e.g., small objects and fine boundary information) of segmentation results will be lost. To address this problem, motivated by a variational approach to image segmentation (i.e., level set theory), we propose a novel loss function called the level set loss which is designed to refine spatial details of segmentation results. To deal with multiple classes in an image, we first decompose the ground truth into binary images. Note that each binary image consists of background and regions belonging to a class. Then we convert level set functions into class probability maps and calculate the energy for each class. The network is trained to minimize the weighted sum of the level set loss and the cross-entropy loss. The proposed level set loss improves the spatial details of segmentation results in a time and memory efficient way. Furthermore, our experimental results show that the proposed loss function achieves better performance than previous approaches.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}