首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Scene-cGAN: A GAN for underwater restoration and scene depth estimation Scene-cGAN:用于水下复原和场景深度估计的 GAN
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-13 DOI: 10.1016/j.cviu.2024.104225
Salma González-Sabbagh , Antonio Robles-Kelly , Shang Gao
Despite their wide scope of application, the development of underwater models for image restoration and scene depth estimation is not a straightforward task due to the limited size and quality of underwater datasets, as well as variations in water colours resulting from attenuation, absorption and scattering phenomena in the water column. To address these challenges, we present an all-in-one conditional generative adversarial network (cGAN) called Scene-cGAN. Our cGAN is a physics-based multi-domain model designed for image dewatering, restoration and depth estimation. It comprises three generators and one discriminator. To train our Scene-cGAN, we use a multi-term loss function based on uni-directional cycle-consistency and a novel dataset. This dataset is constructed from RGB-D in-air images using spectral data and concentrations of water constituents obtained from real-world water quality surveys. This approach allows us to produce imagery consistent with the radiance and veiling light corresponding to representative water types. Additionally, we compare Scene-cGAN with current state-of-the-art methods using various datasets. Results demonstrate its competitiveness in terms of colour restoration and its effectiveness in estimating the depth information for complex underwater scenes.
尽管水下模型的应用范围很广,但由于水下数据集的规模和质量有限,以及水体中的衰减、吸收和散射现象导致的水色变化,开发用于图像复原和场景深度估计的水下模型并非易事。为了应对这些挑战,我们提出了一种名为场景生成对抗网络(Scene-cGAN)的一体化条件生成对抗网络(cGAN)。我们的 cGAN 是一个基于物理的多域模型,设计用于图像脱水、还原和深度估计。它由三个生成器和一个判别器组成。为了训练 Scene-cGAN,我们使用了基于单向循环一致性的多期损失函数和一个新颖的数据集。该数据集由 RGB-D 空中图像构建而成,使用了从真实世界水质调查中获得的光谱数据和水成分浓度。通过这种方法,我们可以生成与代表性水体类型对应的辐射和纱光相一致的图像。此外,我们还利用各种数据集将 Scene-cGAN 与当前最先进的方法进行了比较。结果表明,该方法在色彩还原方面具有竞争力,在估计复杂水下场景的深度信息方面也很有效。
{"title":"Scene-cGAN: A GAN for underwater restoration and scene depth estimation","authors":"Salma González-Sabbagh ,&nbsp;Antonio Robles-Kelly ,&nbsp;Shang Gao","doi":"10.1016/j.cviu.2024.104225","DOIUrl":"10.1016/j.cviu.2024.104225","url":null,"abstract":"<div><div>Despite their wide scope of application, the development of underwater models for image restoration and scene depth estimation is not a straightforward task due to the limited size and quality of underwater datasets, as well as variations in water colours resulting from attenuation, absorption and scattering phenomena in the water column. To address these challenges, we present an all-in-one conditional generative adversarial network (cGAN) called Scene-cGAN. Our cGAN is a physics-based multi-domain model designed for image dewatering, restoration and depth estimation. It comprises three generators and one discriminator. To train our Scene-cGAN, we use a multi-term loss function based on uni-directional cycle-consistency and a novel dataset. This dataset is constructed from RGB-D in-air images using spectral data and concentrations of water constituents obtained from real-world water quality surveys. This approach allows us to produce imagery consistent with the radiance and veiling light corresponding to representative water types. Additionally, we compare Scene-cGAN with current state-of-the-art methods using various datasets. Results demonstrate its competitiveness in terms of colour restoration and its effectiveness in estimating the depth information for complex underwater scenes.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104225"},"PeriodicalIF":4.3,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142653911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data 2S-SGCN:用于三维数据面部地标检测的两级分层图卷积网络模型
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-12 DOI: 10.1016/j.cviu.2024.104227
Jacopo Burger, Giorgio Blandano, Giuseppe Maurizio Facchi, Raffaella Lanzarotti
Facial Landmark Detection (FLD) algorithms play a crucial role in numerous computer vision applications, particularly in tasks such as face recognition, head pose estimation, and facial expression analysis. While FLD on images has long been the focus, the emergence of 3D data has led to a surge of interest in FLD on it due to its potential applications in various fields, including medical research. However, automating FLD in this context presents significant challenges, such as selecting suitable network architectures, refining outputs for precise landmark localization and optimizing computational efficiency. In response, this paper presents a novel approach, the 2-Stage Stratified Graph Convolutional Network (2S-SGCN), which addresses these challenges comprehensively. The first stage aims to detect landmark regions using heatmap regression, which leverages both local and long-range dependencies through a stratified approach. In the second stage, 3D landmarks are precisely determined using a new post-processing technique, namely MSE-over-mesh. 2S-SGCN ensures both efficiency and suitability for resource-constrained devices. Experimental results on 3D scans from the public Facescape and Headspace datasets, as well as on point clouds derived from FLAME meshes collected in the DAD-3DHeads dataset, demonstrate that the proposed method achieves state-of-the-art performance across various conditions. Source code is accessible at https://github.com/gfacchi-dev/CVIU-2S-SGCN.
面部地标检测(FLD)算法在众多计算机视觉应用中发挥着至关重要的作用,尤其是在人脸识别、头部姿态估计和面部表情分析等任务中。长期以来,图像上的 FLD 一直是人们关注的焦点,而三维数据的出现使人们对其产生了浓厚的兴趣,因为它在医学研究等各个领域都有潜在的应用价值。然而,在这种情况下实现 FLD 自动化面临着巨大的挑战,例如选择合适的网络架构、完善精确地标定位的输出以及优化计算效率。为此,本文提出了一种新方法--两阶段分层图卷积网络(2S-SGCN),它能全面应对这些挑战。第一阶段旨在利用热图回归检测地标区域,通过分层方法利用局部和长程依赖关系。在第二阶段,利用一种新的后处理技术(即网格上的 MSE)精确确定三维地标。2S-SGCN 既保证了效率,又适用于资源有限的设备。对来自公共 Facescape 和 Headspace 数据集的三维扫描以及从 DAD-3DHeads 数据集收集的 FLAME 网格中提取的点云的实验结果表明,所提出的方法在各种条件下都能达到最先进的性能。源代码请访问 https://github.com/gfacchi-dev/CVIU-2S-SGCN。
{"title":"2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data","authors":"Jacopo Burger,&nbsp;Giorgio Blandano,&nbsp;Giuseppe Maurizio Facchi,&nbsp;Raffaella Lanzarotti","doi":"10.1016/j.cviu.2024.104227","DOIUrl":"10.1016/j.cviu.2024.104227","url":null,"abstract":"<div><div>Facial Landmark Detection (FLD) algorithms play a crucial role in numerous computer vision applications, particularly in tasks such as face recognition, head pose estimation, and facial expression analysis. While FLD on images has long been the focus, the emergence of 3D data has led to a surge of interest in FLD on it due to its potential applications in various fields, including medical research. However, automating FLD in this context presents significant challenges, such as selecting suitable network architectures, refining outputs for precise landmark localization and optimizing computational efficiency. In response, this paper presents a novel approach, the 2-Stage Stratified Graph Convolutional Network (<span>2S-SGCN</span>), which addresses these challenges comprehensively. The first stage aims to detect landmark regions using heatmap regression, which leverages both local and long-range dependencies through a stratified approach. In the second stage, 3D landmarks are precisely determined using a new post-processing technique, namely <span>MSE-over-mesh</span>. <span>2S-SGCN</span> ensures both efficiency and suitability for resource-constrained devices. Experimental results on 3D scans from the public Facescape and Headspace datasets, as well as on point clouds derived from FLAME meshes collected in the DAD-3DHeads dataset, demonstrate that the proposed method achieves state-of-the-art performance across various conditions. Source code is accessible at <span><span>https://github.com/gfacchi-dev/CVIU-2S-SGCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104227"},"PeriodicalIF":4.3,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142653910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual stage semantic information based generative adversarial network for image super-resolution 基于生成式对抗网络的双阶段语义信息图像超分辨率
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-11 DOI: 10.1016/j.cviu.2024.104226
Shailza Sharma , Abhinav Dhall , Shikhar Johri , Vinay Kumar , Vivek Singh
Deep learning has revolutionized image super-resolution, yet challenges persist in preserving intricate details and avoiding overly smooth reconstructions. In this work, we introduce a novel architecture, the Residue and Semantic Feature-based Dual Subpixel Generative Adversarial Network (RSF-DSGAN), which emphasizes the critical role of semantic information in addressing these issues. The proposed generator architecture is designed with two sequential stages: the Premier Residual Stage and the Deuxième Residual Stage. These stages are concatenated to form a dual-stage upsampling process, substantially augmenting the model’s capacity for feature learning. A central innovation of our approach is the integration of semantic information directly into the generator. Specifically, feature maps derived from a pre-trained network are fused with the primary feature maps of the first stage, enriching the generator with high-level contextual cues. This semantic infusion enhances the fidelity and sharpness of reconstructed images, particularly in preserving object details and textures. Inter- and intra-residual connections are employed within these stages to maintain high-frequency details and fine textures. Additionally, spectral normalization is introduced in the discriminator to stabilize training. Comprehensive evaluations, including visual perception and mean opinion scores, demonstrate that RSF-DSGAN, with its emphasis on semantic information, outperforms current state-of-the-art super-resolution methods.
深度学习为图像超分辨率带来了革命性的变化,但在保留复杂细节和避免过于平滑的重建方面仍然存在挑战。在这项工作中,我们介绍了一种新颖的架构--基于残差和语义特征的双子像素生成对抗网络(RSF-DSGAN),它强调了语义信息在解决这些问题中的关键作用。拟议的生成器架构设计有两个连续的阶段:第一残差阶段和第二残差阶段。这两个阶段连接起来形成一个双阶段的上采样过程,大大增强了模型的特征学习能力。我们方法的核心创新点是将语义信息直接整合到生成器中。具体来说,从预训练网络中提取的特征图与第一阶段的主要特征图融合在一起,通过高级上下文线索丰富生成器。这种语义注入可提高重建图像的保真度和清晰度,尤其是在保留物体细节和纹理方面。在这些阶段中,采用了残基间和残基内的连接,以保持高频细节和精细纹理。此外,还在判别器中引入了光谱归一化以稳定训练。包括视觉感知和平均意见分数在内的综合评估结果表明,RSF-DSGAN 重视语义信息,其效果优于目前最先进的超分辨率方法。
{"title":"Dual stage semantic information based generative adversarial network for image super-resolution","authors":"Shailza Sharma ,&nbsp;Abhinav Dhall ,&nbsp;Shikhar Johri ,&nbsp;Vinay Kumar ,&nbsp;Vivek Singh","doi":"10.1016/j.cviu.2024.104226","DOIUrl":"10.1016/j.cviu.2024.104226","url":null,"abstract":"<div><div>Deep learning has revolutionized image super-resolution, yet challenges persist in preserving intricate details and avoiding overly smooth reconstructions. In this work, we introduce a novel architecture, the Residue and Semantic Feature-based Dual Subpixel Generative Adversarial Network (RSF-DSGAN), which emphasizes the critical role of semantic information in addressing these issues. The proposed generator architecture is designed with two sequential stages: the Premier Residual Stage and the Deuxième Residual Stage. These stages are concatenated to form a dual-stage upsampling process, substantially augmenting the model’s capacity for feature learning. A central innovation of our approach is the integration of semantic information directly into the generator. Specifically, feature maps derived from a pre-trained network are fused with the primary feature maps of the first stage, enriching the generator with high-level contextual cues. This semantic infusion enhances the fidelity and sharpness of reconstructed images, particularly in preserving object details and textures. Inter- and intra-residual connections are employed within these stages to maintain high-frequency details and fine textures. Additionally, spectral normalization is introduced in the discriminator to stabilize training. Comprehensive evaluations, including visual perception and mean opinion scores, demonstrate that RSF-DSGAN, with its emphasis on semantic information, outperforms current state-of-the-art super-resolution methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104226"},"PeriodicalIF":4.3,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing scene text detectors with realistic text image synthesis using diffusion models 利用扩散模型合成逼真的文本图像,增强场景文本检测器的功能
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1016/j.cviu.2024.104224
Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai
Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and time-consuming. As a solution, researchers have widely adopted synthetic text images as a complementary resource to real text images during pre-training. Yet there is still room for synthetic datasets to enhance the performance of scene text detectors. We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background. To alleviate this problem, we present the Diffusion Model based Text Generator (DiffText), a pipeline that utilizes the diffusion model to seamlessly blend foreground text regions with the background’s intrinsic features. Additionally, we propose two strategies to generate visually coherent text with fewer spelling errors. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors. Extensive experiments on detecting horizontal, rotated, curved, and line-level texts demonstrate the effectiveness of DiffText in producing realistic text images. Code is available at: https://github.com/99Franklin/DiffText.
场景文本检测技术因其广泛的应用而备受关注。然而,现有方法对训练数据的要求很高,而获得准确的人工注释则需要耗费大量人力和时间。作为一种解决方案,研究人员广泛采用合成文本图像作为预训练时真实文本图像的补充资源。然而,合成数据集仍有提升场景文本检测器性能的空间。我们认为,现有生成方法的一个主要局限是前景文本与背景的融合度不够。为了缓解这一问题,我们提出了基于扩散模型的文本生成器(DiffText),这是一种利用扩散模型将前景文本区域与背景固有特征无缝融合的管道。此外,我们还提出了两种策略来生成视觉上连贯且拼写错误较少的文本。由于文本实例较少,我们生成的文本图像在辅助文本检测器方面一直超越其他合成数据。在检测水平、旋转、弯曲和行级文本方面的大量实验证明,DiffText 能有效生成逼真的文本图像。代码见:https://github.com/99Franklin/DiffText。
{"title":"Enhancing scene text detectors with realistic text image synthesis using diffusion models","authors":"Ling Fu ,&nbsp;Zijie Wu ,&nbsp;Yingying Zhu ,&nbsp;Yuliang Liu ,&nbsp;Xiang Bai","doi":"10.1016/j.cviu.2024.104224","DOIUrl":"10.1016/j.cviu.2024.104224","url":null,"abstract":"<div><div>Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and time-consuming. As a solution, researchers have widely adopted synthetic text images as a complementary resource to real text images during pre-training. Yet there is still room for synthetic datasets to enhance the performance of scene text detectors. We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background. To alleviate this problem, we present the <strong>Diff</strong>usion Model based <strong>Text</strong> Generator (<strong>DiffText</strong>), a pipeline that utilizes the diffusion model to seamlessly blend foreground text regions with the background’s intrinsic features. Additionally, we propose two strategies to generate visually coherent text with fewer spelling errors. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors. Extensive experiments on detecting horizontal, rotated, curved, and line-level texts demonstrate the effectiveness of DiffText in producing realistic text images. Code is available at: <span><span>https://github.com/99Franklin/DiffText</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104224"},"PeriodicalIF":4.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UUD-Fusion: An unsupervised universal image fusion approach via generative diffusion model UUD-Fusion:通过生成扩散模型实现无监督通用图像融合方法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1016/j.cviu.2024.104218
Xiangxiang Wang , Lixing Fang , Junli Zhao , Zhenkuan Pan , Hui Li , Yi Li
Image fusion is a classical problem in the field of image processing whose solutions are usually not unique. The common image fusion methods can only generate a fixed fusion result based on the source image pairs. They tend to be applicable only to a specific task and have high computational costs. Hence, in this paper, a two-stage unsupervised universal image fusion with generative diffusion model is proposed, termed as UUD-Fusion. For the first stage, a strategy based on the initial fusion results is devised to offload the computational effort. For the second stage, two novel sampling algorithms based on generative diffusion model are designed. The fusion sequence generation algorithm (FSGA) searches for a series of solutions in the solution space by iterative sampling. The fusion image enhancement algorithm (FIEA) greatly improves the quality of the fused images. Qualitative and quantitative evaluations of multiple datasets with different modalities demonstrate the great versatility and effectiveness of UUD-Fusion. It is capable of solving different fusion problems, including multi-focus image fusion task, multi-exposure image fusion task, infrared and visible fusion task, and medical image fusion task. The proposed approach is superior to current state-of-the-art methods. Our code is publicly available at https://github.com/xiangxiang-wang/UUD-Fusion.
图像融合是图像处理领域的一个经典问题,其解决方案通常不是唯一的。常见的图像融合方法只能根据源图像对生成固定的融合结果。这些方法往往只适用于特定的任务,而且计算成本较高。因此,本文提出了一种具有生成扩散模型的两阶段无监督通用图像融合方法,称为 UUD-Fusion。在第一阶段,设计了一种基于初始融合结果的策略,以减轻计算工作量。在第二阶段,设计了两种基于生成扩散模型的新型采样算法。融合序列生成算法(FSGA)通过迭代采样在解空间中搜索一系列解。融合图像增强算法(FIEA)大大提高了融合图像的质量。对不同模式的多个数据集进行的定性和定量评估表明,UUD-Fusion 具有强大的通用性和有效性。它能够解决不同的融合问题,包括多焦点图像融合任务、多曝光图像融合任务、红外和可见光融合任务以及医学图像融合任务。所提出的方法优于目前最先进的方法。我们的代码可在 https://github.com/xiangxiang-wang/UUD-Fusion 公开获取。
{"title":"UUD-Fusion: An unsupervised universal image fusion approach via generative diffusion model","authors":"Xiangxiang Wang ,&nbsp;Lixing Fang ,&nbsp;Junli Zhao ,&nbsp;Zhenkuan Pan ,&nbsp;Hui Li ,&nbsp;Yi Li","doi":"10.1016/j.cviu.2024.104218","DOIUrl":"10.1016/j.cviu.2024.104218","url":null,"abstract":"<div><div>Image fusion is a classical problem in the field of image processing whose solutions are usually not unique. The common image fusion methods can only generate a fixed fusion result based on the source image pairs. They tend to be applicable only to a specific task and have high computational costs. Hence, in this paper, a two-stage unsupervised universal image fusion with generative diffusion model is proposed, termed as UUD-Fusion. For the first stage, a strategy based on the initial fusion results is devised to offload the computational effort. For the second stage, two novel sampling algorithms based on generative diffusion model are designed. The fusion sequence generation algorithm (FSGA) searches for a series of solutions in the solution space by iterative sampling. The fusion image enhancement algorithm (FIEA) greatly improves the quality of the fused images. Qualitative and quantitative evaluations of multiple datasets with different modalities demonstrate the great versatility and effectiveness of UUD-Fusion. It is capable of solving different fusion problems, including multi-focus image fusion task, multi-exposure image fusion task, infrared and visible fusion task, and medical image fusion task. The proposed approach is superior to current state-of-the-art methods. Our code is publicly available at <span><span>https://github.com/xiangxiang-wang/UUD-Fusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104218"},"PeriodicalIF":4.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised co-generation of foreground–background segmentation from Text-to-Image synthesis 从文本到图像合成的无监督前景-背景分割协同生成
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1016/j.cviu.2024.104223
Yeruru Asrar Ahmed, Anurag Mittal
Text-to-Image (T2I) synthesis is a challenging task requiring modelling both textual and image domains and their relationship. The substantial improvement in image quality achieved by recent works has paved the way for numerous applications such as language-aided image editing, computer-aided design, text-based image retrieval, and training data augmentation. In this work, we ask a simple question: Along with realistic images, can we obtain any useful by-product (e.g. foreground/background or multi-class segmentation masks, detection labels) in an unsupervised way that will also benefit other computer vision tasks and applications?. In an attempt to answer this question, we explore generating realistic images and their corresponding foreground/background segmentation masks from the given text. To achieve this, we experiment the concept of co-segmentation along with GAN. Specifically, a novel GAN architecture called Co-Segmentation Inspired GAN (COS-GAN) is proposed that generates two or more images simultaneously from different noise vectors and utilises a spatial co-attention mechanism between the image features to produce realistic segmentation masks for each of the generated images. The advantages of such an architecture are two-fold: (1) The generated segmentation masks can be used to focus on foreground and background exclusively to improve the quality of generated images, and (2) the segmentation masks can be used as a training target for other tasks, such as object localisation and segmentation. Extensive experiments conducted on CUB, Oxford-102, and COCO datasets show that COS-GAN is able to improve visual quality and generate reliable foreground/background masks for the generated images.
文本到图像(T2I)合成是一项具有挑战性的任务,需要对文本域和图像域及其关系进行建模。近年来,图像质量大幅提高,为语言辅助图像编辑、计算机辅助设计、基于文本的图像检索和训练数据增强等众多应用铺平了道路。在这项工作中,我们提出了一个简单的问题:除了现实图像,我们能否以无监督的方式获得任何有用的副产品(如前景/背景或多类分割掩码、检测标签),从而使其他计算机视觉任务和应用受益?为了回答这个问题,我们探索从给定文本中生成真实图像及其相应的前景/背景分割掩码。为此,我们尝试将协同分割概念与 GAN 结合使用。具体来说,我们提出了一种名为共分割启发 GAN(COS-GAN)的新型 GAN 架构,该架构可同时从不同的噪声矢量生成两幅或多幅图像,并利用图像特征之间的空间共关注机制为生成的每幅图像生成逼真的分割掩码。这种架构有两方面的优势:(1) 生成的分割掩码可用于专门聚焦前景和背景,以提高生成图像的质量;(2) 分割掩码可用作其他任务(如物体定位和分割)的训练目标。在 CUB、Oxford-102 和 COCO 数据集上进行的大量实验表明,COS-GAN 能够提高视觉质量,并为生成的图像生成可靠的前景/背景掩码。
{"title":"Unsupervised co-generation of foreground–background segmentation from Text-to-Image synthesis","authors":"Yeruru Asrar Ahmed,&nbsp;Anurag Mittal","doi":"10.1016/j.cviu.2024.104223","DOIUrl":"10.1016/j.cviu.2024.104223","url":null,"abstract":"<div><div>Text-to-Image (T2I) synthesis is a challenging task requiring modelling both textual and image domains and their relationship. The substantial improvement in image quality achieved by recent works has paved the way for numerous applications such as language-aided image editing, computer-aided design, text-based image retrieval, and training data augmentation. In this work, we ask a simple question: Along with realistic images, can we obtain any useful by-product (<em>e.g.</em> foreground/background or multi-class segmentation masks, detection labels) in an unsupervised way that will also benefit other computer vision tasks and applications?. In an attempt to answer this question, we explore generating realistic images and their corresponding foreground/background segmentation masks from the given text. To achieve this, we experiment the concept of co-segmentation along with GAN. Specifically, a novel GAN architecture called Co-Segmentation Inspired GAN (COS-GAN) is proposed that generates two or more images simultaneously from different noise vectors and utilises a spatial co-attention mechanism between the image features to produce realistic segmentation masks for each of the generated images. The advantages of such an architecture are two-fold: (1) The generated segmentation masks can be used to focus on foreground and background exclusively to improve the quality of generated images, and (2) the segmentation masks can be used as a training target for other tasks, such as object localisation and segmentation. Extensive experiments conducted on CUB, Oxford-102, and COCO datasets show that COS-GAN is able to improve visual quality and generate reliable foreground/background masks for the generated images.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104223"},"PeriodicalIF":4.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bilevel progressive homography estimation via correlative region-focused transformer 通过相关区域聚焦变换器进行双级渐进式同构估计
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.cviu.2024.104209
Qi Jia , Xiaomei Feng , Wei Zhang , Yu Liu , Nan Pu , Nicu Sebe
We propose a novel correlative region-focused transformer for accurate homography estimation by a bilevel progressive architecture. Existing methods typically consider the entire image features to establish correlations for a pair of input images, but irrelevant regions often introduce mismatches and outliers. In contrast, our network effectively mitigates the negative impact of irrelevant regions through a bilevel progressive homography estimation architecture. Specifically, in the outer iteration, we progressively estimate the homography matrix at different feature scales; in the inner iteration, we dynamically extract correlative regions and progressively focus on their corresponding features from both inputs. Moreover, we develop a quadtree attention mechanism based on the transformer to explicitly capture the correspondence between the input images, localizing and cropping the correlative regions for the next iteration. This progressive training strategy enhances feature consistency and enables precise alignment with comparable inference rates. Extensive experiments on qualitative and quantitative comparisons show that the proposed method exhibits competitive alignment results while reducing the mean average corner error (MACE) on the MS-COCO dataset compared to previous methods, without increasing additional parameter cost.
我们提出了一种新颖的以区域为重点的关联变换器,用于通过双级渐进式架构进行精确的同源性估计。现有方法通常考虑整个图像特征来建立一对输入图像的相关性,但无关区域往往会带来不匹配和异常值。相比之下,我们的网络通过双级渐进式同构估计架构,有效地减轻了无关区域的负面影响。具体来说,在外层迭代中,我们逐步估算不同特征尺度的同源性矩阵;在内层迭代中,我们动态提取相关区域,并逐步关注其来自两个输入的相应特征。此外,我们还开发了一种基于变换器的四叉树关注机制,以明确捕捉输入图像之间的对应关系,为下一次迭代定位和裁剪相关区域。这种渐进式训练策略增强了特征的一致性,实现了精确配准和可比推理率。广泛的定性和定量比较实验表明,与以前的方法相比,所提出的方法在 MS-COCO 数据集上显示出有竞争力的配准结果,同时降低了平均角误差(MACE),而不会增加额外的参数成本。
{"title":"Bilevel progressive homography estimation via correlative region-focused transformer","authors":"Qi Jia ,&nbsp;Xiaomei Feng ,&nbsp;Wei Zhang ,&nbsp;Yu Liu ,&nbsp;Nan Pu ,&nbsp;Nicu Sebe","doi":"10.1016/j.cviu.2024.104209","DOIUrl":"10.1016/j.cviu.2024.104209","url":null,"abstract":"<div><div>We propose a novel correlative region-focused transformer for accurate homography estimation by a bilevel progressive architecture. Existing methods typically consider the entire image features to establish correlations for a pair of input images, but irrelevant regions often introduce mismatches and outliers. In contrast, our network effectively mitigates the negative impact of irrelevant regions through a bilevel progressive homography estimation architecture. Specifically, in the outer iteration, we progressively estimate the homography matrix at different feature scales; in the inner iteration, we dynamically extract correlative regions and progressively focus on their corresponding features from both inputs. Moreover, we develop a quadtree attention mechanism based on the transformer to explicitly capture the correspondence between the input images, localizing and cropping the correlative regions for the next iteration. This progressive training strategy enhances feature consistency and enables precise alignment with comparable inference rates. Extensive experiments on qualitative and quantitative comparisons show that the proposed method exhibits competitive alignment results while reducing the mean average corner error (MACE) on the MS-COCO dataset compared to previous methods, without increasing additional parameter cost.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104209"},"PeriodicalIF":4.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leaf cultivar identification via prototype-enhanced learning 通过原型强化学习识别叶片栽培品种
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.cviu.2024.104221
Yiyi Zhang , Zhiwen Ying , Ying Zheng , Cuiling Wu , Nannan Li , Fangfang Wang , Jun Wang , Xianzhong Feng , Xiaogang Xu
Leaf cultivar identification, as a typical task of ultra-fine-grained visual classification (UFGVC), is facing a huge challenge due to the high similarity among different varieties. In practice, an instance may be related to multiple varieties to varying degrees, especially in the UFGVC datasets. However, deep learning methods trained on one-hot labels fail to reflect patterns shared across categories and thus perform poorly on this task. As an analogy to natural language processing (NLP), by capturing the co-relation between labels, label embedding can select the most informative words and neglect irrelevant ones when predicting different labels. Based on this intuition, we propose a novel method named Prototype-enhanced Learning (PEL), which is predicated on the assumption that label embedding encoded with the inter-class relationships would force the image classification model to focus on discriminative patterns. In addition, a new prototype update module is put forward to learn inter-class relations by capturing label semantic overlap and iteratively update prototypes to generate continuously enhanced soft targets. Prototype-enhanced soft labels not only contain original one-hot label information, but also introduce rich inter-category semantic association information, thus providing more effective supervision for deep model training. Extensive experimental results on 7 public datasets show that our method can significantly improve the performance on the task of ultra-fine-grained visual classification. The code is available at https://github.com/YIYIZH/PEL.
叶片栽培品种识别是超精细视觉分类(UFGVC)的一项典型任务,由于不同品种之间的高度相似性,这项任务面临着巨大的挑战。在实践中,一个实例可能在不同程度上与多个品种相关,尤其是在 UFGVC 数据集中。然而,根据单点标签训练的深度学习方法无法反映不同类别之间的共享模式,因此在这项任务中表现不佳。与自然语言处理(NLP)类似,通过捕捉标签之间的相互关系,标签嵌入可以在预测不同标签时选择信息量最大的单词,而忽略无关的单词。基于这一直觉,我们提出了一种名为 "原型增强学习"(Prototype-enhanced Learning,PEL)的新方法,其前提假设是,以类间关系为编码的标签嵌入会迫使图像分类模型专注于具有区分性的模式。此外,还提出了一个新的原型更新模块,通过捕捉标签语义重叠来学习类间关系,并迭代更新原型以生成持续增强的软目标。原型增强后的软标签不仅包含原始的单点标签信息,还引入了丰富的类间语义关联信息,从而为深度模型训练提供更有效的监督。在 7 个公开数据集上的大量实验结果表明,我们的方法可以显著提高超细粒度视觉分类任务的性能。代码见 https://github.com/YIYIZH/PEL。
{"title":"Leaf cultivar identification via prototype-enhanced learning","authors":"Yiyi Zhang ,&nbsp;Zhiwen Ying ,&nbsp;Ying Zheng ,&nbsp;Cuiling Wu ,&nbsp;Nannan Li ,&nbsp;Fangfang Wang ,&nbsp;Jun Wang ,&nbsp;Xianzhong Feng ,&nbsp;Xiaogang Xu","doi":"10.1016/j.cviu.2024.104221","DOIUrl":"10.1016/j.cviu.2024.104221","url":null,"abstract":"<div><div>Leaf cultivar identification, as a typical task of ultra-fine-grained visual classification (UFGVC), is facing a huge challenge due to the high similarity among different varieties. In practice, an instance may be related to multiple varieties to varying degrees, especially in the UFGVC datasets. However, deep learning methods trained on one-hot labels fail to reflect patterns shared across categories and thus perform poorly on this task. As an analogy to natural language processing (NLP), by capturing the co-relation between labels, label embedding can select the most informative words and neglect irrelevant ones when predicting different labels. Based on this intuition, we propose a novel method named Prototype-enhanced Learning (PEL), which is predicated on the assumption that label embedding encoded with the inter-class relationships would force the image classification model to focus on discriminative patterns. In addition, a new prototype update module is put forward to learn inter-class relations by capturing label semantic overlap and iteratively update prototypes to generate continuously enhanced soft targets. Prototype-enhanced soft labels not only contain original one-hot label information, but also introduce rich inter-category semantic association information, thus providing more effective supervision for deep model training. Extensive experimental results on 7 public datasets show that our method can significantly improve the performance on the task of ultra-fine-grained visual classification. The code is available at <span><span>https://github.com/YIYIZH/PEL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104221"},"PeriodicalIF":4.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced local multi-windows attention network for lightweight image super-resolution 用于轻量级图像超分辨率的增强型本地多窗口注意力网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.cviu.2024.104217
Yanheng Lv, Lulu Pan, Ke Xu, Guo Li, Wenbo Zhang, Lingxiao Li, Le Lei
Since the global self-attention mechanism can capture long-distance dependencies well, Transformer-based methods have achieved remarkable performance in many vision tasks, including single-image super-resolution (SISR). However, there are strong local self-similarities in images, if the global self-attention mechanism is still used for image processing, it may lead to excessive use of computing resources on parts of the image with weak correlation. Especially in the high-resolution large-size image, the global self-attention will lead to a large number of redundant calculations. To solve this problem, we propose the Enhanced Local Multi-windows Attention Network (ELMA), which contains two main designs. First, different from the traditional self-attention based on square window partition, we propose a Multi-windows Self-Attention (M-WSA) which uses a new window partitioning mechanism to obtain different types of local long-distance dependencies. Compared with original self-attention mechanisms commonly used in other SR networks, M-WSA reduces computational complexity and achieves superior performance through analysis and experiments. Secondly, we propose a Spatial Gated Network (SGN) as a feed-forward network, which can effectively overcome the problem of intermediate channel redundancy in traditional MLP, thereby improving the parameter utilization and computational efficiency of the network. Meanwhile, SGN introduces spatial information into the feed-forward network that traditional MLP cannot obtain. It can better understand and use the spatial structure information in the image, and enhances the network performance and generalization ability. Extensive experiments show that ELMA achieves competitive performance compared to state-of-the-art methods while maintaining fewer parameters and computational costs.
由于全局自注意机制能很好地捕捉长距离依赖关系,基于变换器的方法在许多视觉任务中都取得了显著的性能,包括单图像超分辨率(SISR)。然而,图像中存在很强的局部自相似性,如果仍然使用全局自注意机制进行图像处理,可能会导致在图像中相关性较弱的部分过度使用计算资源。特别是在高分辨率的大尺寸图像中,全局自注意会导致大量冗余计算。为了解决这个问题,我们提出了增强型局部多窗口注意力网络(ELMA),它包含两个主要设计。首先,与传统的基于方形窗口分割的自注意不同,我们提出了一种多窗口自注意(M-WSA),它使用一种新的窗口分割机制来获得不同类型的本地长距离依赖关系。与其他 SR 网络常用的原始自注意机制相比,M-WSA 降低了计算复杂度,并通过分析和实验实现了更优越的性能。其次,我们提出了空间门控网络(SGN)作为前馈网络,它能有效克服传统 MLP 中的中间信道冗余问题,从而提高网络的参数利用率和计算效率。同时,SGN 在前馈网络中引入了传统 MLP 无法获取的空间信息。它能更好地理解和利用图像中的空间结构信息,提高网络性能和泛化能力。广泛的实验表明,与最先进的方法相比,ELMA 在保持较少参数和计算成本的同时,实现了具有竞争力的性能。
{"title":"Enhanced local multi-windows attention network for lightweight image super-resolution","authors":"Yanheng Lv,&nbsp;Lulu Pan,&nbsp;Ke Xu,&nbsp;Guo Li,&nbsp;Wenbo Zhang,&nbsp;Lingxiao Li,&nbsp;Le Lei","doi":"10.1016/j.cviu.2024.104217","DOIUrl":"10.1016/j.cviu.2024.104217","url":null,"abstract":"<div><div>Since the global self-attention mechanism can capture long-distance dependencies well, Transformer-based methods have achieved remarkable performance in many vision tasks, including single-image super-resolution (SISR). However, there are strong local self-similarities in images, if the global self-attention mechanism is still used for image processing, it may lead to excessive use of computing resources on parts of the image with weak correlation. Especially in the high-resolution large-size image, the global self-attention will lead to a large number of redundant calculations. To solve this problem, we propose the Enhanced Local Multi-windows Attention Network (ELMA), which contains two main designs. First, different from the traditional self-attention based on square window partition, we propose a Multi-windows Self-Attention (M-WSA) which uses a new window partitioning mechanism to obtain different types of local long-distance dependencies. Compared with original self-attention mechanisms commonly used in other SR networks, M-WSA reduces computational complexity and achieves superior performance through analysis and experiments. Secondly, we propose a Spatial Gated Network (SGN) as a feed-forward network, which can effectively overcome the problem of intermediate channel redundancy in traditional MLP, thereby improving the parameter utilization and computational efficiency of the network. Meanwhile, SGN introduces spatial information into the feed-forward network that traditional MLP cannot obtain. It can better understand and use the spatial structure information in the image, and enhances the network performance and generalization ability. Extensive experiments show that ELMA achieves competitive performance compared to state-of-the-art methods while maintaining fewer parameters and computational costs.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104217"},"PeriodicalIF":4.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142654049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity 通过卷积稀疏表示和非局部自相似性同时实现图像去噪和补全
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.cviu.2024.104216
Weimin Yuan , Yuanyuan Wang , Ruirui Fan , Yuxuan Zhang , Guangmei Wei , Cai Meng , Xiangzhi Bai
Low rank matrix approximation (LRMA) has been widely studied due to its capability of approximating original image from the degraded image. According to the characteristics of degraded images, image denoising and image completion have become research objects. Existing methods are usually designed for a single task. In this paper, focusing on the task of simultaneous image denoising and completion, we propose a weighted low rank sparse representation model and the corresponding efficient algorithm based on LRMA. The proposed method integrates convolutional analysis sparse representation (ASR) and nonlocal statistical modeling to maintain local smoothness and nonlocal self-similarity (NLSM) of natural images. More importantly, we explore the alternating direction method of multipliers (ADMM) to solve the above inverse problem efficiently due to the complexity of simultaneous image denoising and completion. We conduct experiments on image completion for partial random samples and mask removal with different noise levels. Extensive experiments on four datasets, i.e., Set12, Kodak, McMaster, and CBSD68, show that the proposed method prevents the transmission of noise while completing images and has achieved better quantitative results and human visual quality compared to 17 methods. The proposed method achieves (1.9%, 1.8%, 4.2%, and 3.7%) gains in average PSNR and (4.2%, 2.9%, 6.7%, and 6.6%) gains in average SSIM over the sub-optimal method across the four datasets, respectively. We also demonstrate that our method can handle the challenging scenarios well. Source code is available at https://github.com/weimin581/demo_CSRNS.
低秩矩阵逼近(LRMA)因其能够从退化图像逼近原始图像而被广泛研究。根据退化图像的特点,图像去噪和图像补全已成为研究对象。现有方法通常针对单一任务而设计。本文针对同时进行图像去噪和补全的任务,提出了一种基于 LRMA 的加权低秩稀疏表示模型和相应的高效算法。所提出的方法整合了卷积分析稀疏表示(ASR)和非局部统计建模,以保持自然图像的局部平滑度和非局部自相似性(NLSM)。更重要的是,由于同时进行图像去噪和补全的复杂性,我们探索了交替方向乘法(ADMM)来高效解决上述逆问题。我们针对部分随机样本和不同噪声水平下的掩码去除进行了图像补全实验。在 Set12、Kodak、McMaster 和 CBSD68 四个数据集上的广泛实验表明,与 17 种方法相比,所提出的方法在完成图像时防止了噪声的传播,取得了更好的定量结果和人的视觉质量。与四个数据集的次优方法相比,所提出的方法在平均 PSNR 和平均 SSIM 方面分别实现了(1.9%、1.8%、4.2% 和 3.7%)提升和(4.2%、2.9%、6.7% 和 6.6%)提升。我们还证明,我们的方法可以很好地处理具有挑战性的场景。源代码见 https://github.com/weimin581/demo_CSRNS。
{"title":"Simultaneous image denoising and completion through convolutional sparse representation and nonlocal self-similarity","authors":"Weimin Yuan ,&nbsp;Yuanyuan Wang ,&nbsp;Ruirui Fan ,&nbsp;Yuxuan Zhang ,&nbsp;Guangmei Wei ,&nbsp;Cai Meng ,&nbsp;Xiangzhi Bai","doi":"10.1016/j.cviu.2024.104216","DOIUrl":"10.1016/j.cviu.2024.104216","url":null,"abstract":"<div><div>Low rank matrix approximation (LRMA) has been widely studied due to its capability of approximating original image from the degraded image. According to the characteristics of degraded images, image denoising and image completion have become research objects. Existing methods are usually designed for a single task. In this paper, focusing on the task of simultaneous image denoising and completion, we propose a weighted low rank sparse representation model and the corresponding efficient algorithm based on LRMA. The proposed method integrates convolutional analysis sparse representation (ASR) and nonlocal statistical modeling to maintain local smoothness and nonlocal self-similarity (NLSM) of natural images. More importantly, we explore the alternating direction method of multipliers (ADMM) to solve the above inverse problem efficiently due to the complexity of simultaneous image denoising and completion. We conduct experiments on image completion for partial random samples and mask removal with different noise levels. Extensive experiments on four datasets, i.e., Set12, Kodak, McMaster, and CBSD68, show that the proposed method prevents the transmission of noise while completing images and has achieved better quantitative results and human visual quality compared to 17 methods. The proposed method achieves (1.9%, 1.8%, 4.2%, and 3.7%) gains in average PSNR and (4.2%, 2.9%, 6.7%, and 6.6%) gains in average SSIM over the sub-optimal method across the four datasets, respectively. We also demonstrate that our method can handle the challenging scenarios well. Source code is available at <span><span>https://github.com/weimin581/demo_CSRNS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104216"},"PeriodicalIF":4.3,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1