Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506168
Oguzhan Ulucan, Diclehan Karakaya, Mehmet Türkan
This paper proposes an effective technique for multi-exposure image fusion and visible-infrared image fusion problems. Multi-exposure fusion algorithms generally extract faulty weight maps when the input stack contains multiple and/or severely over-exposed images. To overcome this issue, an alternative method is developed for weight map characterization and refinement in addition to the perspectives of linear embeddings of images and adaptive morphological masking. This framework has then been extended to the visible and infrared image fusion problem. The comprehensive experimental comparisons demonstrate that the proposed algorithm significantly enhances the fused image quality both statistically and visually.
{"title":"Image Fusion Through Linear Embeddings","authors":"Oguzhan Ulucan, Diclehan Karakaya, Mehmet Türkan","doi":"10.1109/ICIP42928.2021.9506168","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506168","url":null,"abstract":"This paper proposes an effective technique for multi-exposure image fusion and visible-infrared image fusion problems. Multi-exposure fusion algorithms generally extract faulty weight maps when the input stack contains multiple and/or severely over-exposed images. To overcome this issue, an alternative method is developed for weight map characterization and refinement in addition to the perspectives of linear embeddings of images and adaptive morphological masking. This framework has then been extended to the visible and infrared image fusion problem. The comprehensive experimental comparisons demonstrate that the proposed algorithm significantly enhances the fused image quality both statistically and visually.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126387181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506662
Y. Nehmé, Mona Abid, G. Lavoué, Matthieu Perreira Da Silva, P. Callet
Many objective quality metrics have been proposed over the years to automate the task of subjective quality assessment. However, few of them are designed for 3D graphical contents with appearance attributes; existing ones are based on geometry and color measures, yet they ignore the visual saliency of the objects. In this paper, we combined an optimal subset of geometry-based and color-based features, provided by a state-of-the-art quality metric for 3D colored meshes, with a visual attention complexity feature adapted to 3D graphics. The performance of our proposed new metric is evaluated on a dataset of 80 meshes with diffuse colors, generated from 5 source models corrupted by commonly used geometry and color distortions. With our proposed metric, we showed that the use of the attentional complexity feature brings a significant gain in performance and better stability.
{"title":"Cmdm-Vac: Improving A Perceptual Quality Metric For 3D Graphics By Integrating A Visual Attention Complexity Measure","authors":"Y. Nehmé, Mona Abid, G. Lavoué, Matthieu Perreira Da Silva, P. Callet","doi":"10.1109/ICIP42928.2021.9506662","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506662","url":null,"abstract":"Many objective quality metrics have been proposed over the years to automate the task of subjective quality assessment. However, few of them are designed for 3D graphical contents with appearance attributes; existing ones are based on geometry and color measures, yet they ignore the visual saliency of the objects. In this paper, we combined an optimal subset of geometry-based and color-based features, provided by a state-of-the-art quality metric for 3D colored meshes, with a visual attention complexity feature adapted to 3D graphics. The performance of our proposed new metric is evaluated on a dataset of 80 meshes with diffuse colors, generated from 5 source models corrupted by commonly used geometry and color distortions. With our proposed metric, we showed that the use of the attentional complexity feature brings a significant gain in performance and better stability.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126431979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506721
Yanan Wei, Jiang Tian, Cheng Zhong, Zhongchao Shi
Automated organ segmentation in CTs is an essential prerequisite for many clinical applications, such as computer-aided diagnosis and intervention. As medical data annotation requires massive human labor from experienced radiologists, how to effectively improve the segmentation performance with limited annotated training data remains a challenging problem. Few-shot learning imitates the learning process of humans, which turns out to be a promising way to overcome the aforementioned challenge. In this paper, we propose a novel anatomical knowledge embedded few-shot network (AKFNet), where an anatomical knowledge embedded support unit (AKSU) is carefully designed to embed the anatomical priors from support images into our model. Moreover, a similarity guidance alignment unit (SGAU) is proposed to impose a mutual alignment between the support and query sets. As a result, AKFNet fully exploits anatomical knowledge and presents good learning capability. Without bells and whistles, AKFNet outperforms the state-of-the-art methods with 0.84-1.76% Dice increase. Transfer learning experiments further verify its learning capability.
{"title":"AKFNET: An Anatomical Knowledge Embedded Few-Shot Network For Medical Image Segmentation","authors":"Yanan Wei, Jiang Tian, Cheng Zhong, Zhongchao Shi","doi":"10.1109/ICIP42928.2021.9506721","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506721","url":null,"abstract":"Automated organ segmentation in CTs is an essential prerequisite for many clinical applications, such as computer-aided diagnosis and intervention. As medical data annotation requires massive human labor from experienced radiologists, how to effectively improve the segmentation performance with limited annotated training data remains a challenging problem. Few-shot learning imitates the learning process of humans, which turns out to be a promising way to overcome the aforementioned challenge. In this paper, we propose a novel anatomical knowledge embedded few-shot network (AKFNet), where an anatomical knowledge embedded support unit (AKSU) is carefully designed to embed the anatomical priors from support images into our model. Moreover, a similarity guidance alignment unit (SGAU) is proposed to impose a mutual alignment between the support and query sets. As a result, AKFNet fully exploits anatomical knowledge and presents good learning capability. Without bells and whistles, AKFNet outperforms the state-of-the-art methods with 0.84-1.76% Dice increase. Transfer learning experiments further verify its learning capability.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126579012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506754
Krishanu Sarker, Xiulong Yang, Yang Li, S. Belkasim, Shihao Ji
The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty reduces reliability of DNNs for real-world applications. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that effectively denoises training data as well as avoids predicting confusing samples. Our proposed framework differentiates noise from clean data samples without modifying existing DNN architectures or loss functions. Extensive experiments on multiple benchmark datasets and recent COVIDx dataset demonstrate the effectiveness of our framework over state-of-the-art (SOTA) methods in denoising training data and abstaining uncertain test data.
{"title":"A Unified Density-Driven Framework For Effective Data Denoising And Robust Abstention","authors":"Krishanu Sarker, Xiulong Yang, Yang Li, S. Belkasim, Shihao Ji","doi":"10.1109/ICIP42928.2021.9506754","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506754","url":null,"abstract":"The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty reduces reliability of DNNs for real-world applications. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that effectively denoises training data as well as avoids predicting confusing samples. Our proposed framework differentiates noise from clean data samples without modifying existing DNN architectures or loss functions. Extensive experiments on multiple benchmark datasets and recent COVIDx dataset demonstrate the effectiveness of our framework over state-of-the-art (SOTA) methods in denoising training data and abstaining uncertain test data.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"36 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122238466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506204
Yueh-Cheng Huang, Chin-Wei Liu, Jen-Hui Chuang
In advanced multi-view video surveillance systems, people localization is usually a crucial part of the complete system and need to be accomplished in a short time to reserve sufficient processing time for subsequent high-level analysis. As the surveillance area increases, it is required to install a large number of cameras for multi-view people localization. To lower the equipment cost and setup time, we incorporate fisheye (or wide-angle) camera to an efficient vanishing point-based line sampling scheme for people localization, by ensuring the fisheye camera is looking downward so that its principal point becomes the vanishing point of vertical lines. Experimental results show that the utilization of fisheye-camera can (i) achieve localization accuracy comparable or superior to that using ordinary cameras, (ii) reduce the camera count by 75% on the average while covering the same or larger size of a monitored area, and (iii) greatly simplify the camera installation process.
{"title":"Using Fisheye Camera For Cost-Effective Multi-View People Localization","authors":"Yueh-Cheng Huang, Chin-Wei Liu, Jen-Hui Chuang","doi":"10.1109/ICIP42928.2021.9506204","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506204","url":null,"abstract":"In advanced multi-view video surveillance systems, people localization is usually a crucial part of the complete system and need to be accomplished in a short time to reserve sufficient processing time for subsequent high-level analysis. As the surveillance area increases, it is required to install a large number of cameras for multi-view people localization. To lower the equipment cost and setup time, we incorporate fisheye (or wide-angle) camera to an efficient vanishing point-based line sampling scheme for people localization, by ensuring the fisheye camera is looking downward so that its principal point becomes the vanishing point of vertical lines. Experimental results show that the utilization of fisheye-camera can (i) achieve localization accuracy comparable or superior to that using ordinary cameras, (ii) reduce the camera count by 75% on the average while covering the same or larger size of a monitored area, and (iii) greatly simplify the camera installation process.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125729204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506438
Yang Liu, Huaqiu Wang, Fanyang Meng, Mengyuan Liu, Hong Liu
Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.
{"title":"Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching","authors":"Yang Liu, Huaqiu Wang, Fanyang Meng, Mengyuan Liu, Hong Liu","doi":"10.1109/ICIP42928.2021.9506438","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506438","url":null,"abstract":"Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126806958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506227
F. Banterle, R. Gong, M. Corsini, F. Ganovelli, L. Gool, Paolo Cignoni
Structure-from-Motion (SfM) using the frames of a video sequence can be a challenging task because there is a lot of redundant information, the computational time increases quadratically with the number of frames, there would be low-quality images (e.g., blurred frames) that can decrease the final quality of the reconstruction, etc. To overcome all these issues, we present a novel deep-learning architecture that is meant for speeding up SfM by selecting frames using predicted sub-sampling frequency. This architecture is general and can learn/distill the knowledge of any algorithm for selecting frames from a video for generating high-quality reconstructions. One key advantage is that we can run our architecture in real-time saving computations while keeping high-quality results.
{"title":"A Deep Learning Method for Frame Selection in Videos for Structure from Motion Pipelines","authors":"F. Banterle, R. Gong, M. Corsini, F. Ganovelli, L. Gool, Paolo Cignoni","doi":"10.1109/ICIP42928.2021.9506227","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506227","url":null,"abstract":"Structure-from-Motion (SfM) using the frames of a video sequence can be a challenging task because there is a lot of redundant information, the computational time increases quadratically with the number of frames, there would be low-quality images (e.g., blurred frames) that can decrease the final quality of the reconstruction, etc. To overcome all these issues, we present a novel deep-learning architecture that is meant for speeding up SfM by selecting frames using predicted sub-sampling frequency. This architecture is general and can learn/distill the knowledge of any algorithm for selecting frames from a video for generating high-quality reconstructions. One key advantage is that we can run our architecture in real-time saving computations while keeping high-quality results.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"574 7776 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114082045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506220
Zongzhe Sun, Feng Zhao, Feng Wu
Due to the lack of labeled data, it is usually difficult for an unsupervised person re-identification (re-ID) model to learn discriminative features. To address this issue, we propose a global-level and patch-level unsupervised feature learning framework that utilizes both global and local information to obtain more discriminative features. For global-level learning, we design a global similarity-based loss (GSL) to leverage the similarities between whole images. Along with a memory-based non-parametric classifier, the GSL pulls credible samples closer to help train a discriminative model. For patch-level learning, we use a patch generation module to produce different patches. Applying the patch-based discriminative feature learning loss and image-level feature learning loss, the patch branch in the network can learn better representative patch features. Combining the global-level learning with patch-level learning, we obtain a more distinguishable re-ID model. Experimental results obtained on Market-1501 and DukeMTMC-reID datasets validate that our method has great superiority and effectiveness in unsupervised person re-ID.
{"title":"Unsupervised Person Re-Identification Via Global-Level And Patch-Level Discriminative Feature Learning","authors":"Zongzhe Sun, Feng Zhao, Feng Wu","doi":"10.1109/ICIP42928.2021.9506220","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506220","url":null,"abstract":"Due to the lack of labeled data, it is usually difficult for an unsupervised person re-identification (re-ID) model to learn discriminative features. To address this issue, we propose a global-level and patch-level unsupervised feature learning framework that utilizes both global and local information to obtain more discriminative features. For global-level learning, we design a global similarity-based loss (GSL) to leverage the similarities between whole images. Along with a memory-based non-parametric classifier, the GSL pulls credible samples closer to help train a discriminative model. For patch-level learning, we use a patch generation module to produce different patches. Applying the patch-based discriminative feature learning loss and image-level feature learning loss, the patch branch in the network can learn better representative patch features. Combining the global-level learning with patch-level learning, we obtain a more distinguishable re-ID model. Experimental results obtained on Market-1501 and DukeMTMC-reID datasets validate that our method has great superiority and effectiveness in unsupervised person re-ID.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116080002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506779
Yeon-Gyu Kim, Hyunsung Kim, Minseok Kang, Hyug-Jae Lee, Rokkyu Lee, Gunhan Park
Various methods for scene text recognition (STR) are proposed every year. These methods dramatically increase the performance of the existing STR field; however, they have not been able to keep up with the progress of general-purpose research in image recognition, detection, speech recognition, and text analysis. In this paper, we evaluate the performance of several deep learning schemes for the encoder part of the Transformer in STR. First, we change the baseline feed forward network (FFN) module of encoder to squeeze-and-excitation (SE)-FFN or cross stage partial (CSP)-FFN. Second, the overall architecture of encoder is replaced with local dense synthesizer attention (LDSA) or Conformer structure. Conformer encoder achieves the best test accuracy in various experiments, and SE or CSP-FFN also showed competitive performance when the number of parameters is considered. Visualizing the attention maps from different encoder combinations allows for qualitative performance.
{"title":"Analysis of the Novel Transformer Module Combination for Scene Text Recognition","authors":"Yeon-Gyu Kim, Hyunsung Kim, Minseok Kang, Hyug-Jae Lee, Rokkyu Lee, Gunhan Park","doi":"10.1109/ICIP42928.2021.9506779","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506779","url":null,"abstract":"Various methods for scene text recognition (STR) are proposed every year. These methods dramatically increase the performance of the existing STR field; however, they have not been able to keep up with the progress of general-purpose research in image recognition, detection, speech recognition, and text analysis. In this paper, we evaluate the performance of several deep learning schemes for the encoder part of the Transformer in STR. First, we change the baseline feed forward network (FFN) module of encoder to squeeze-and-excitation (SE)-FFN or cross stage partial (CSP)-FFN. Second, the overall architecture of encoder is replaced with local dense synthesizer attention (LDSA) or Conformer structure. Conformer encoder achieves the best test accuracy in various experiments, and SE or CSP-FFN also showed competitive performance when the number of parameters is considered. Visualizing the attention maps from different encoder combinations allows for qualitative performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116218746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-09-19DOI: 10.1109/ICIP42928.2021.9506452
Xiaoliu Luo, Taiping Zhang
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with a few annotations. Previous methods mainly establish the correspondence between support images and query images with global information. However, human perception does not tend to learn a whole representation in its entirety at once. In this paper, we propose a novel network to build the correspondence from subparts, parts and whole. Our network mainly contain two novel designs: we firstly adopt graph convolutional network to make pixels not only contain the information of each pixel itself but also include its contextual pixels, and then a learnable Graph Affinity Module(GAM) is proposed to mine more accurate relationships as well as common object location inference between the support images and the query images. Experiments on the PASCAL-5i dataset show that our method achieves state-of-the-art performance.
few -shot segmentation的目的是学习一种可以用少量注释推广到新类的分割模型。以前的方法主要是用全局信息建立支持图像和查询图像之间的对应关系。然而,人类的感知并不倾向于一次性完整地学习整个表象。在本文中,我们提出了一种新的网络来建立子部分、部分和整体之间的对应关系。我们的网络主要有两种新颖的设计:首先采用图卷积网络,使像素不仅包含每个像素本身的信息,还包含其上下文像素,然后提出一个可学习的图关联模块(GAM),在支持图像和查询图像之间挖掘更精确的关系以及共同的目标位置推断。在PASCAL-5i数据集上的实验表明,我们的方法达到了最先进的性能。
{"title":"Graph Affinity Network for Few-Shot Segmentation","authors":"Xiaoliu Luo, Taiping Zhang","doi":"10.1109/ICIP42928.2021.9506452","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506452","url":null,"abstract":"Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with a few annotations. Previous methods mainly establish the correspondence between support images and query images with global information. However, human perception does not tend to learn a whole representation in its entirety at once. In this paper, we propose a novel network to build the correspondence from subparts, parts and whole. Our network mainly contain two novel designs: we firstly adopt graph convolutional network to make pixels not only contain the information of each pixel itself but also include its contextual pixels, and then a learnable Graph Affinity Module(GAM) is proposed to mine more accurate relationships as well as common object location inference between the support images and the query images. Experiments on the PASCAL-5i dataset show that our method achieves state-of-the-art performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115609284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}