首页 > 最新文献

2021 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Image Fusion Through Linear Embeddings 通过线性嵌入的图像融合
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506168
Oguzhan Ulucan, Diclehan Karakaya, Mehmet Türkan
This paper proposes an effective technique for multi-exposure image fusion and visible-infrared image fusion problems. Multi-exposure fusion algorithms generally extract faulty weight maps when the input stack contains multiple and/or severely over-exposed images. To overcome this issue, an alternative method is developed for weight map characterization and refinement in addition to the perspectives of linear embeddings of images and adaptive morphological masking. This framework has then been extended to the visible and infrared image fusion problem. The comprehensive experimental comparisons demonstrate that the proposed algorithm significantly enhances the fused image quality both statistically and visually.
提出了一种有效的多曝光图像融合和可见-红外图像融合技术。当输入堆栈包含多个和/或严重过度曝光的图像时,多曝光融合算法通常会提取错误的权重图。为了克服这一问题,除了图像的线性嵌入和自适应形态掩蔽的视角外,还开发了一种替代方法用于权重图的表征和细化。然后将该框架推广到可见光和红外图像融合问题。综合实验对比表明,该算法在统计和视觉上都显著提高了融合图像的质量。
{"title":"Image Fusion Through Linear Embeddings","authors":"Oguzhan Ulucan, Diclehan Karakaya, Mehmet Türkan","doi":"10.1109/ICIP42928.2021.9506168","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506168","url":null,"abstract":"This paper proposes an effective technique for multi-exposure image fusion and visible-infrared image fusion problems. Multi-exposure fusion algorithms generally extract faulty weight maps when the input stack contains multiple and/or severely over-exposed images. To overcome this issue, an alternative method is developed for weight map characterization and refinement in addition to the perspectives of linear embeddings of images and adaptive morphological masking. This framework has then been extended to the visible and infrared image fusion problem. The comprehensive experimental comparisons demonstrate that the proposed algorithm significantly enhances the fused image quality both statistically and visually.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126387181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cmdm-Vac: Improving A Perceptual Quality Metric For 3D Graphics By Integrating A Visual Attention Complexity Measure Cmdm-Vac:通过集成视觉注意复杂性度量来改进3D图形的感知质量度量
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506662
Y. Nehmé, Mona Abid, G. Lavoué, Matthieu Perreira Da Silva, P. Callet
Many objective quality metrics have been proposed over the years to automate the task of subjective quality assessment. However, few of them are designed for 3D graphical contents with appearance attributes; existing ones are based on geometry and color measures, yet they ignore the visual saliency of the objects. In this paper, we combined an optimal subset of geometry-based and color-based features, provided by a state-of-the-art quality metric for 3D colored meshes, with a visual attention complexity feature adapted to 3D graphics. The performance of our proposed new metric is evaluated on a dataset of 80 meshes with diffuse colors, generated from 5 source models corrupted by commonly used geometry and color distortions. With our proposed metric, we showed that the use of the attentional complexity feature brings a significant gain in performance and better stability.
多年来,人们提出了许多客观质量度量来自动化主观质量评估任务。然而,它们很少是为具有外观属性的3D图形内容设计的;现有的方法是基于几何形状和颜色测量,但它们忽略了物体的视觉显著性。在本文中,我们结合了基于几何和基于颜色的特征的最优子集,由最先进的3D彩色网格质量度量提供,以及适合3D图形的视觉注意力复杂性特征。我们提出的新度量的性能在80个带有漫射颜色的数据集上进行了评估,这些数据集由5个被常用几何形状和颜色扭曲损坏的源模型生成。通过我们提出的度量,我们表明使用注意力复杂性特征可以显著提高性能和更好的稳定性。
{"title":"Cmdm-Vac: Improving A Perceptual Quality Metric For 3D Graphics By Integrating A Visual Attention Complexity Measure","authors":"Y. Nehmé, Mona Abid, G. Lavoué, Matthieu Perreira Da Silva, P. Callet","doi":"10.1109/ICIP42928.2021.9506662","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506662","url":null,"abstract":"Many objective quality metrics have been proposed over the years to automate the task of subjective quality assessment. However, few of them are designed for 3D graphical contents with appearance attributes; existing ones are based on geometry and color measures, yet they ignore the visual saliency of the objects. In this paper, we combined an optimal subset of geometry-based and color-based features, provided by a state-of-the-art quality metric for 3D colored meshes, with a visual attention complexity feature adapted to 3D graphics. The performance of our proposed new metric is evaluated on a dataset of 80 meshes with diffuse colors, generated from 5 source models corrupted by commonly used geometry and color distortions. With our proposed metric, we showed that the use of the attentional complexity feature brings a significant gain in performance and better stability.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126431979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AKFNET: An Anatomical Knowledge Embedded Few-Shot Network For Medical Image Segmentation AKFNET:一种用于医学图像分割的嵌入解剖知识的少镜头网络
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506721
Yanan Wei, Jiang Tian, Cheng Zhong, Zhongchao Shi
Automated organ segmentation in CTs is an essential prerequisite for many clinical applications, such as computer-aided diagnosis and intervention. As medical data annotation requires massive human labor from experienced radiologists, how to effectively improve the segmentation performance with limited annotated training data remains a challenging problem. Few-shot learning imitates the learning process of humans, which turns out to be a promising way to overcome the aforementioned challenge. In this paper, we propose a novel anatomical knowledge embedded few-shot network (AKFNet), where an anatomical knowledge embedded support unit (AKSU) is carefully designed to embed the anatomical priors from support images into our model. Moreover, a similarity guidance alignment unit (SGAU) is proposed to impose a mutual alignment between the support and query sets. As a result, AKFNet fully exploits anatomical knowledge and presents good learning capability. Without bells and whistles, AKFNet outperforms the state-of-the-art methods with 0.84-1.76% Dice increase. Transfer learning experiments further verify its learning capability.
ct中的自动器官分割是许多临床应用的必要前提,例如计算机辅助诊断和干预。由于医疗数据标注需要经验丰富的放射科医生大量的人力劳动,如何利用有限的标注训练数据有效地提高分割性能仍然是一个具有挑战性的问题。Few-shot学习模仿人类的学习过程,这被证明是克服上述挑战的一种有希望的方法。在本文中,我们提出了一种新的解剖知识嵌入少镜头网络(AKFNet),其中解剖知识嵌入支持单元(AKSU)经过精心设计,将来自支持图像的解剖先验嵌入到我们的模型中。此外,还提出了相似引导对齐单元(SGAU)来实现支持集和查询集之间的相互对齐。因此,AKFNet充分利用了解剖学知识,具有良好的学习能力。没有花哨的东西,AKFNet以0.84-1.76%的骰子增加优于最先进的方法。迁移学习实验进一步验证了其学习能力。
{"title":"AKFNET: An Anatomical Knowledge Embedded Few-Shot Network For Medical Image Segmentation","authors":"Yanan Wei, Jiang Tian, Cheng Zhong, Zhongchao Shi","doi":"10.1109/ICIP42928.2021.9506721","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506721","url":null,"abstract":"Automated organ segmentation in CTs is an essential prerequisite for many clinical applications, such as computer-aided diagnosis and intervention. As medical data annotation requires massive human labor from experienced radiologists, how to effectively improve the segmentation performance with limited annotated training data remains a challenging problem. Few-shot learning imitates the learning process of humans, which turns out to be a promising way to overcome the aforementioned challenge. In this paper, we propose a novel anatomical knowledge embedded few-shot network (AKFNet), where an anatomical knowledge embedded support unit (AKSU) is carefully designed to embed the anatomical priors from support images into our model. Moreover, a similarity guidance alignment unit (SGAU) is proposed to impose a mutual alignment between the support and query sets. As a result, AKFNet fully exploits anatomical knowledge and presents good learning capability. Without bells and whistles, AKFNet outperforms the state-of-the-art methods with 0.84-1.76% Dice increase. Transfer learning experiments further verify its learning capability.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126579012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Unified Density-Driven Framework For Effective Data Denoising And Robust Abstention 一种统一的密度驱动框架用于有效的数据去噪和鲁棒消噪
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506754
Krishanu Sarker, Xiulong Yang, Yang Li, S. Belkasim, Shihao Ji
The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty reduces reliability of DNNs for real-world applications. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that effectively denoises training data as well as avoids predicting confusing samples. Our proposed framework differentiates noise from clean data samples without modifying existing DNN architectures or loss functions. Extensive experiments on multiple benchmark datasets and recent COVIDx dataset demonstrate the effectiveness of our framework over state-of-the-art (SOTA) methods in denoising training data and abstaining uncertain test data.
深度神经网络(dnn)的成功很大程度上取决于数据质量。此外,预测的不确定性降低了dnn在实际应用中的可靠性。在本文中,我们的目标是通过提出一个利用底层数据密度的统一过滤框架来解决这两个问题,该框架有效地去噪训练数据并避免预测混淆的样本。我们提出的框架在不修改现有DNN架构或损失函数的情况下将噪声与干净数据样本区分开来。在多个基准数据集和最近的covid数据集上进行的大量实验表明,我们的框架在去噪训练数据和消除不确定测试数据方面优于最先进的(SOTA)方法。
{"title":"A Unified Density-Driven Framework For Effective Data Denoising And Robust Abstention","authors":"Krishanu Sarker, Xiulong Yang, Yang Li, S. Belkasim, Shihao Ji","doi":"10.1109/ICIP42928.2021.9506754","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506754","url":null,"abstract":"The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty reduces reliability of DNNs for real-world applications. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that effectively denoises training data as well as avoids predicting confusing samples. Our proposed framework differentiates noise from clean data samples without modifying existing DNN architectures or loss functions. Extensive experiments on multiple benchmark datasets and recent COVIDx dataset demonstrate the effectiveness of our framework over state-of-the-art (SOTA) methods in denoising training data and abstaining uncertain test data.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"36 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122238466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using Fisheye Camera For Cost-Effective Multi-View People Localization 利用鱼眼相机进行高性价比的多视角人物定位
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506204
Yueh-Cheng Huang, Chin-Wei Liu, Jen-Hui Chuang
In advanced multi-view video surveillance systems, people localization is usually a crucial part of the complete system and need to be accomplished in a short time to reserve sufficient processing time for subsequent high-level analysis. As the surveillance area increases, it is required to install a large number of cameras for multi-view people localization. To lower the equipment cost and setup time, we incorporate fisheye (or wide-angle) camera to an efficient vanishing point-based line sampling scheme for people localization, by ensuring the fisheye camera is looking downward so that its principal point becomes the vanishing point of vertical lines. Experimental results show that the utilization of fisheye-camera can (i) achieve localization accuracy comparable or superior to that using ordinary cameras, (ii) reduce the camera count by 75% on the average while covering the same or larger size of a monitored area, and (iii) greatly simplify the camera installation process.
在先进的多视点视频监控系统中,人员定位通常是整个系统的关键部分,需要在较短的时间内完成定位,为后续的高层分析预留足够的处理时间。随着监控面积的增加,需要安装大量的摄像机进行多视点人员定位。为了降低设备成本和设置时间,我们将鱼眼(或广角)相机结合到一种高效的基于消失点的人物定位线采样方案中,通过确保鱼眼相机向下看,使其主点成为垂直线的消失点。实验结果表明,利用鱼眼摄像机可以(1)达到与普通摄像机相当或更好的定位精度,(2)在覆盖相同或更大的监控区域的情况下,平均减少75%的摄像机数量,(3)大大简化了摄像机的安装过程。
{"title":"Using Fisheye Camera For Cost-Effective Multi-View People Localization","authors":"Yueh-Cheng Huang, Chin-Wei Liu, Jen-Hui Chuang","doi":"10.1109/ICIP42928.2021.9506204","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506204","url":null,"abstract":"In advanced multi-view video surveillance systems, people localization is usually a crucial part of the complete system and need to be accomplished in a short time to reserve sufficient processing time for subsequent high-level analysis. As the surveillance area increases, it is required to install a large number of cameras for multi-view people localization. To lower the equipment cost and setup time, we incorporate fisheye (or wide-angle) camera to an efficient vanishing point-based line sampling scheme for people localization, by ensuring the fisheye camera is looking downward so that its principal point becomes the vanishing point of vertical lines. Experimental results show that the utilization of fisheye-camera can (i) achieve localization accuracy comparable or superior to that using ordinary cameras, (ii) reduce the camera count by 75% on the average while covering the same or larger size of a monitored area, and (iii) greatly simplify the camera installation process.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125729204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching 注意、纠正和聚焦:一个用于图像-文本匹配的双向正确注意网络
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506438
Yang Liu, Huaqiu Wang, Fanyang Meng, Mengyuan Liu, Hong Liu
Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.
图像-文本匹配任务旨在学习图像和句子之间的细粒度对应关系。现有方法采用注意机制,通过关注所有片段来学习对应关系,而没有考虑片段与全局语义的关系,这不可避免地导致不相关片段之间的语义错位。为此,我们提出了一种双向正确注意网络(BCAN),它利用全局相似度和局部相似度来重新分配注意权重,以避免这种语义错位。具体来说,我们引入了一个全局正确单元来纠正对不相关语义中相关片段的关注。局部纠错单元用于纠错相关语义中不相关片段的注意力。在Flickr30K和mscoo数据集上的实验验证了我们提出的BCAN的有效性,它优于以前的基于注意力的方法和最先进的方法。代码可以在https://github.com/liuyyy111/BCAN找到。
{"title":"Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching","authors":"Yang Liu, Huaqiu Wang, Fanyang Meng, Mengyuan Liu, Hong Liu","doi":"10.1109/ICIP42928.2021.9506438","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506438","url":null,"abstract":"Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126806958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Deep Learning Method for Frame Selection in Videos for Structure from Motion Pipelines 基于运动管道结构的视频帧选择深度学习方法
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506227
F. Banterle, R. Gong, M. Corsini, F. Ganovelli, L. Gool, Paolo Cignoni
Structure-from-Motion (SfM) using the frames of a video sequence can be a challenging task because there is a lot of redundant information, the computational time increases quadratically with the number of frames, there would be low-quality images (e.g., blurred frames) that can decrease the final quality of the reconstruction, etc. To overcome all these issues, we present a novel deep-learning architecture that is meant for speeding up SfM by selecting frames using predicted sub-sampling frequency. This architecture is general and can learn/distill the knowledge of any algorithm for selecting frames from a video for generating high-quality reconstructions. One key advantage is that we can run our architecture in real-time saving computations while keeping high-quality results.
使用视频序列帧的运动结构(SfM)可能是一项具有挑战性的任务,因为有大量冗余信息,计算时间随帧数呈二次增长,会出现低质量图像(例如,模糊帧),从而降低重建的最终质量等。为了克服所有这些问题,我们提出了一种新的深度学习架构,旨在通过使用预测的子采样频率选择帧来加速SfM。这种架构是通用的,可以学习/提取任何算法的知识,用于从视频中选择帧以生成高质量的重建。一个关键的优势是我们可以实时运行我们的架构,节省计算,同时保持高质量的结果。
{"title":"A Deep Learning Method for Frame Selection in Videos for Structure from Motion Pipelines","authors":"F. Banterle, R. Gong, M. Corsini, F. Ganovelli, L. Gool, Paolo Cignoni","doi":"10.1109/ICIP42928.2021.9506227","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506227","url":null,"abstract":"Structure-from-Motion (SfM) using the frames of a video sequence can be a challenging task because there is a lot of redundant information, the computational time increases quadratically with the number of frames, there would be low-quality images (e.g., blurred frames) that can decrease the final quality of the reconstruction, etc. To overcome all these issues, we present a novel deep-learning architecture that is meant for speeding up SfM by selecting frames using predicted sub-sampling frequency. This architecture is general and can learn/distill the knowledge of any algorithm for selecting frames from a video for generating high-quality reconstructions. One key advantage is that we can run our architecture in real-time saving computations while keeping high-quality results.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"574 7776 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114082045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unsupervised Person Re-Identification Via Global-Level And Patch-Level Discriminative Feature Learning 基于全局和局部判别特征学习的无监督人再识别
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506220
Zongzhe Sun, Feng Zhao, Feng Wu
Due to the lack of labeled data, it is usually difficult for an unsupervised person re-identification (re-ID) model to learn discriminative features. To address this issue, we propose a global-level and patch-level unsupervised feature learning framework that utilizes both global and local information to obtain more discriminative features. For global-level learning, we design a global similarity-based loss (GSL) to leverage the similarities between whole images. Along with a memory-based non-parametric classifier, the GSL pulls credible samples closer to help train a discriminative model. For patch-level learning, we use a patch generation module to produce different patches. Applying the patch-based discriminative feature learning loss and image-level feature learning loss, the patch branch in the network can learn better representative patch features. Combining the global-level learning with patch-level learning, we obtain a more distinguishable re-ID model. Experimental results obtained on Market-1501 and DukeMTMC-reID datasets validate that our method has great superiority and effectiveness in unsupervised person re-ID.
由于缺乏标记数据,无监督人再识别(re-ID)模型通常难以学习判别特征。为了解决这个问题,我们提出了一个全局级和补丁级的无监督特征学习框架,该框架利用全局和局部信息来获得更多的判别特征。对于全局学习,我们设计了一个基于全局相似性的损失(GSL)来利用整个图像之间的相似性。与基于记忆的非参数分类器一起,GSL将可信样本拉得更近,以帮助训练判别模型。对于补丁级学习,我们使用补丁生成模块来生成不同的补丁。利用基于patch的判别特征学习损失和图像级特征学习损失,网络中的patch分支可以更好地学习到具有代表性的patch特征。将全局级学习与补丁级学习相结合,得到了一个更容易区分的re-ID模型。在Market-1501和DukeMTMC-reID数据集上的实验结果验证了我们的方法在无监督人员身份识别中具有很大的优越性和有效性。
{"title":"Unsupervised Person Re-Identification Via Global-Level And Patch-Level Discriminative Feature Learning","authors":"Zongzhe Sun, Feng Zhao, Feng Wu","doi":"10.1109/ICIP42928.2021.9506220","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506220","url":null,"abstract":"Due to the lack of labeled data, it is usually difficult for an unsupervised person re-identification (re-ID) model to learn discriminative features. To address this issue, we propose a global-level and patch-level unsupervised feature learning framework that utilizes both global and local information to obtain more discriminative features. For global-level learning, we design a global similarity-based loss (GSL) to leverage the similarities between whole images. Along with a memory-based non-parametric classifier, the GSL pulls credible samples closer to help train a discriminative model. For patch-level learning, we use a patch generation module to produce different patches. Applying the patch-based discriminative feature learning loss and image-level feature learning loss, the patch branch in the network can learn better representative patch features. Combining the global-level learning with patch-level learning, we obtain a more distinguishable re-ID model. Experimental results obtained on Market-1501 and DukeMTMC-reID datasets validate that our method has great superiority and effectiveness in unsupervised person re-ID.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116080002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Analysis of the Novel Transformer Module Combination for Scene Text Recognition 基于场景文本识别的新型变压器模块组合分析
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506779
Yeon-Gyu Kim, Hyunsung Kim, Minseok Kang, Hyug-Jae Lee, Rokkyu Lee, Gunhan Park
Various methods for scene text recognition (STR) are proposed every year. These methods dramatically increase the performance of the existing STR field; however, they have not been able to keep up with the progress of general-purpose research in image recognition, detection, speech recognition, and text analysis. In this paper, we evaluate the performance of several deep learning schemes for the encoder part of the Transformer in STR. First, we change the baseline feed forward network (FFN) module of encoder to squeeze-and-excitation (SE)-FFN or cross stage partial (CSP)-FFN. Second, the overall architecture of encoder is replaced with local dense synthesizer attention (LDSA) or Conformer structure. Conformer encoder achieves the best test accuracy in various experiments, and SE or CSP-FFN also showed competitive performance when the number of parameters is considered. Visualizing the attention maps from different encoder combinations allows for qualitative performance.
每年都会提出各种场景文本识别(STR)方法。这些方法极大地提高了现有STR字段的性能;然而,在图像识别、检测、语音识别和文本分析等通用研究方面,它们还不能跟上进展。在本文中,我们评估了几种深度学习方案在STR中变压器编码器部分的性能。首先,我们将编码器的基线前馈网络(FFN)模块更改为压缩激励(SE)-FFN或跨阶段部分(CSP)-FFN。其次,将编码器的整体结构替换为局部密集合成器(LDSA)或Conformer结构。在各种实验中,共形编码器获得了最好的测试精度,在考虑参数数量的情况下,SE或CSP-FFN也表现出竞争力。从不同的编码器组合可视化的注意图允许定性性能。
{"title":"Analysis of the Novel Transformer Module Combination for Scene Text Recognition","authors":"Yeon-Gyu Kim, Hyunsung Kim, Minseok Kang, Hyug-Jae Lee, Rokkyu Lee, Gunhan Park","doi":"10.1109/ICIP42928.2021.9506779","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506779","url":null,"abstract":"Various methods for scene text recognition (STR) are proposed every year. These methods dramatically increase the performance of the existing STR field; however, they have not been able to keep up with the progress of general-purpose research in image recognition, detection, speech recognition, and text analysis. In this paper, we evaluate the performance of several deep learning schemes for the encoder part of the Transformer in STR. First, we change the baseline feed forward network (FFN) module of encoder to squeeze-and-excitation (SE)-FFN or cross stage partial (CSP)-FFN. Second, the overall architecture of encoder is replaced with local dense synthesizer attention (LDSA) or Conformer structure. Conformer encoder achieves the best test accuracy in various experiments, and SE or CSP-FFN also showed competitive performance when the number of parameters is considered. Visualizing the attention maps from different encoder combinations allows for qualitative performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116218746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph Affinity Network for Few-Shot Segmentation 少镜头分割的图关联网络
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506452
Xiaoliu Luo, Taiping Zhang
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with a few annotations. Previous methods mainly establish the correspondence between support images and query images with global information. However, human perception does not tend to learn a whole representation in its entirety at once. In this paper, we propose a novel network to build the correspondence from subparts, parts and whole. Our network mainly contain two novel designs: we firstly adopt graph convolutional network to make pixels not only contain the information of each pixel itself but also include its contextual pixels, and then a learnable Graph Affinity Module(GAM) is proposed to mine more accurate relationships as well as common object location inference between the support images and the query images. Experiments on the PASCAL-5i dataset show that our method achieves state-of-the-art performance.
few -shot segmentation的目的是学习一种可以用少量注释推广到新类的分割模型。以前的方法主要是用全局信息建立支持图像和查询图像之间的对应关系。然而,人类的感知并不倾向于一次性完整地学习整个表象。在本文中,我们提出了一种新的网络来建立子部分、部分和整体之间的对应关系。我们的网络主要有两种新颖的设计:首先采用图卷积网络,使像素不仅包含每个像素本身的信息,还包含其上下文像素,然后提出一个可学习的图关联模块(GAM),在支持图像和查询图像之间挖掘更精确的关系以及共同的目标位置推断。在PASCAL-5i数据集上的实验表明,我们的方法达到了最先进的性能。
{"title":"Graph Affinity Network for Few-Shot Segmentation","authors":"Xiaoliu Luo, Taiping Zhang","doi":"10.1109/ICIP42928.2021.9506452","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506452","url":null,"abstract":"Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with a few annotations. Previous methods mainly establish the correspondence between support images and query images with global information. However, human perception does not tend to learn a whole representation in its entirety at once. In this paper, we propose a novel network to build the correspondence from subparts, parts and whole. Our network mainly contain two novel designs: we firstly adopt graph convolutional network to make pixels not only contain the information of each pixel itself but also include its contextual pixels, and then a learnable Graph Affinity Module(GAM) is proposed to mine more accurate relationships as well as common object location inference between the support images and the query images. Experiments on the PASCAL-5i dataset show that our method achieves state-of-the-art performance.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115609284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1