Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision最新文献

MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning MaxGNR:一种基于最大梯度噪声比的多任务学习动态权重策略

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2023-02-18 DOI: 10.48550/arXiv.2302.09352

Caoyun Fan, Wenqing Chen, Jidong Tian, Yitian Li, Hao He, Yaohui Jin

When modeling related tasks in computer vision, Multi-Task Learning (MTL) can outperform Single-Task Learning (STL) due to its ability to capture intrinsic relatedness among tasks. However, MTL may encounter the insufficient training problem, i.e., some tasks in MTL may encounter non-optimal situation compared with STL. A series of studies point out that too much gradient noise would lead to performance degradation in STL, however, in the MTL scenario, Inter-Task Gradient Noise (ITGN) is an additional source of gradient noise for each task, which can also affect the optimization process. In this paper, we point out ITGN as a key factor leading to the insufficient training problem. We define the Gradient-to-Noise Ratio (GNR) to measure the relative magnitude of gradient noise and design the MaxGNR algorithm to alleviate the ITGN interference of each task by maximizing the GNR of each task. We carefully evaluate our MaxGNR algorithm on two standard image MTL datasets: NYUv2 and Cityscapes. The results show that our algorithm outperforms the baselines under identical experimental conditions.

在计算机视觉相关任务建模中，多任务学习(MTL)由于能够捕捉任务之间的内在关联性而优于单任务学习(STL)。但是，MTL可能会遇到训练不足的问题，即与STL相比，MTL中的某些任务可能会遇到非最优情况。一系列研究指出，在STL中，过多的梯度噪声会导致性能下降，而在MTL场景中，任务间梯度噪声(ITGN)是每个任务的额外梯度噪声来源，也会影响优化过程。本文指出ITGN是导致训练不足问题的关键因素。我们定义了梯度噪声比(GNR)来衡量梯度噪声的相对大小，并设计了MaxGNR算法，通过最大化各任务的GNR来减轻各任务的ITGN干扰。我们在两个标准图像MTL数据集:NYUv2和cityscape上仔细评估了我们的MaxGNR算法。结果表明，在相同的实验条件下，该算法的性能优于基线。

{"title":"MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning","authors":"Caoyun Fan, Wenqing Chen, Jidong Tian, Yitian Li, Hao He, Yaohui Jin","doi":"10.48550/arXiv.2302.09352","DOIUrl":"https://doi.org/10.48550/arXiv.2302.09352","url":null,"abstract":"When modeling related tasks in computer vision, Multi-Task Learning (MTL) can outperform Single-Task Learning (STL) due to its ability to capture intrinsic relatedness among tasks. However, MTL may encounter the insufficient training problem, i.e., some tasks in MTL may encounter non-optimal situation compared with STL. A series of studies point out that too much gradient noise would lead to performance degradation in STL, however, in the MTL scenario, Inter-Task Gradient Noise (ITGN) is an additional source of gradient noise for each task, which can also affect the optimization process. In this paper, we point out ITGN as a key factor leading to the insufficient training problem. We define the Gradient-to-Noise Ratio (GNR) to measure the relative magnitude of gradient noise and design the MaxGNR algorithm to alleviate the ITGN interference of each task by maximizing the GNR of each task. We carefully evaluate our MaxGNR algorithm on two standard image MTL datasets: NYUv2 and Cityscapes. The results show that our algorithm outperforms the baselines under identical experimental conditions.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81570746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NoiseTransfer: Image Noise Generation with Contrastive Embeddings NoiseTransfer:图像噪声生成与对比嵌入

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2023-01-31 DOI: 10.48550/arXiv.2301.13554

Seunghwan Lee, Tae Hyun Kim

Deep image denoising networks have achieved impressive success with the help of a considerably large number of synthetic train datasets. However, real-world denoising is a still challenging problem due to the dissimilarity between distributions of real and synthetic noisy datasets. Although several real-world noisy datasets have been presented, the number of train datasets (i.e., pairs of clean and real noisy images) is limited, and acquiring more real noise datasets is laborious and expensive. To mitigate this problem, numerous attempts to simulate real noise models using generative models have been studied. Nevertheless, previous works had to train multiple networks to handle multiple different noise distributions. By contrast, we propose a new generative model that can synthesize noisy images with multiple different noise distributions. Specifically, we adopt recent contrastive learning to learn distinguishable latent features of the noise. Moreover, our model can generate new noisy images by transferring the noise characteristics solely from a single reference noisy image. We demonstrate the accuracy and the effectiveness of our noise model for both known and unknown noise removal.

在大量合成训练数据集的帮助下，深度图像去噪网络取得了令人印象深刻的成功。然而，现实世界的去噪仍然是一个具有挑战性的问题，因为真实和合成噪声数据集的分布存在差异。尽管已经提出了几个真实世界的噪声数据集，但列车数据集(即干净和真实的噪声图像对)的数量有限，获取更多真实噪声数据集既费力又昂贵。为了缓解这个问题，许多尝试使用生成模型来模拟真实的噪声模型已经被研究。然而，以前的工作必须训练多个网络来处理多种不同的噪声分布。相比之下，我们提出了一种新的生成模型，可以合成具有多种不同噪声分布的噪声图像。具体来说，我们采用最近的对比学习来学习噪声的可区分的潜在特征。此外，我们的模型可以通过仅从单个参考噪声图像中传递噪声特征来生成新的噪声图像。我们证明了我们的噪声模型在去除已知和未知噪声方面的准确性和有效性。

{"title":"NoiseTransfer: Image Noise Generation with Contrastive Embeddings","authors":"Seunghwan Lee, Tae Hyun Kim","doi":"10.48550/arXiv.2301.13554","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13554","url":null,"abstract":"Deep image denoising networks have achieved impressive success with the help of a considerably large number of synthetic train datasets. However, real-world denoising is a still challenging problem due to the dissimilarity between distributions of real and synthetic noisy datasets. Although several real-world noisy datasets have been presented, the number of train datasets (i.e., pairs of clean and real noisy images) is limited, and acquiring more real noise datasets is laborious and expensive. To mitigate this problem, numerous attempts to simulate real noise models using generative models have been studied. Nevertheless, previous works had to train multiple networks to handle multiple different noise distributions. By contrast, we propose a new generative model that can synthesize noisy images with multiple different noise distributions. Specifically, we adopt recent contrastive learning to learn distinguishable latent features of the noise. Moreover, our model can generate new noisy images by transferring the noise characteristics solely from a single reference noisy image. We demonstrate the accuracy and the effectiveness of our noise model for both known and unknown noise removal.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86909395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization 平面感知归一化的布局引导室内全景图像绘制

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2023-01-13 DOI: 10.48550/arXiv.2301.05624

Chaochen Gao, Cheng Chen, Jheng-Wei Su, Hung-Kuo Chu

We present an end-to-end deep learning framework for indoor panoramic image inpainting. Although previous inpainting methods have shown impressive performance on natural perspective images, most fail to handle panoramic images, particularly indoor scenes, which usually contain complex structure and texture content. To achieve better inpainting quality, we propose to exploit both the global and local context of indoor panorama during the inpainting process. Specifically, we take the low-level layout edges estimated from the input panorama as a prior to guide the inpainting model for recovering the global indoor structure. A plane-aware normalization module is employed to embed plane-wise style features derived from the layout into the generator, encouraging local texture restoration from adjacent room structures (i.e., ceiling, floor, and walls). Experimental results show that our work outperforms the current state-of-the-art methods on a public panoramic dataset in both qualitative and quantitative evaluations. Our code is available at https://ericsujw.github.io/LGPN-net/

提出了一种用于室内全景图像绘制的端到端深度学习框架。尽管以往的图像绘制方法在自然透视图像上表现出色，但对于全景图像，尤其是室内场景，通常包含复杂的结构和纹理内容，大多数方法都无法处理。为了获得更好的绘画质量，我们建议在绘画过程中同时利用室内全景的全局和局部背景。具体来说，我们将从输入全景图中估计出的低层布局边缘作为先验来指导修复模型以恢复全局室内结构。平面感知归一化模块用于将平面风格特征嵌入到生成器中，鼓励从相邻房间结构(即天花板，地板和墙壁)中恢复局部纹理。实验结果表明，我们的工作在定性和定量评估方面都优于当前公共全景数据集上最先进的方法。我们的代码可在https://ericsujw.github.io/LGPN-net/上获得

{"title":"Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization","authors":"Chaochen Gao, Cheng Chen, Jheng-Wei Su, Hung-Kuo Chu","doi":"10.48550/arXiv.2301.05624","DOIUrl":"https://doi.org/10.48550/arXiv.2301.05624","url":null,"abstract":"We present an end-to-end deep learning framework for indoor panoramic image inpainting. Although previous inpainting methods have shown impressive performance on natural perspective images, most fail to handle panoramic images, particularly indoor scenes, which usually contain complex structure and texture content. To achieve better inpainting quality, we propose to exploit both the global and local context of indoor panorama during the inpainting process. Specifically, we take the low-level layout edges estimated from the input panorama as a prior to guide the inpainting model for recovering the global indoor structure. A plane-aware normalization module is employed to embed plane-wise style features derived from the layout into the generator, encouraging local texture restoration from adjacent room structures (i.e., ceiling, floor, and walls). Experimental results show that our work outperforms the current state-of-the-art methods on a public panoramic dataset in both qualitative and quantitative evaluations. Our code is available at https://ericsujw.github.io/LGPN-net/","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84293370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image 分层服装网:从单个图像生成多个隐式服装层

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-22 DOI: 10.48550/arXiv.2211.11931

Alakh Aggarwal, Ji-kai Wang, S. Hogue, Saifeng Ni, M. Budagavi, Xiaohu Guo

Recent research works have focused on generating human models and garments from their 2D images. However, state-of-the-art researches focus either on only a single layer of the garment on a human model or on generating multiple garment layers without any guarantee of the intersection-free geometric relationship between them. In reality, people wear multiple layers of garments in their daily life, where an inner layer of garment could be partially covered by an outer one. In this paper, we try to address this multi-layer modeling problem and propose the Layered-Garment Net (LGN) that is capable of generating intersection-free multiple layers of garments defined by implicit function fields over the body surface, given the person's near front-view image. With a special design of garment indication fields (GIF), we can enforce an implicit covering relationship between the signed distance fields (SDF) of different layers to avoid self-intersections among different garment surfaces and the human body. Experiments demonstrate the strength of our proposed LGN framework in generating multi-layer garments as compared to state-of-the-art methods. To the best of our knowledge, LGN is the first research work to generate intersection-free multiple layers of garments on the human body from a single image.

最近的研究工作集中在从2D图像生成人体模型和服装上。然而，目前的研究主要集中在人体模型上的单一服装层，或者在没有任何保证它们之间无相交的几何关系的情况下生成多个服装层。在现实生活中，人们在日常生活中穿着多层衣服，其中一层衣服可能被一层衣服部分覆盖。在本文中，我们试图解决这个多层建模问题，并提出了分层服装网络(LGN)，该网络能够生成由人体表面上的隐式函数场定义的无相交多层服装，给定人的近前视图像。通过特殊的服装指示场(GIF)设计，我们可以在不同层的标志距离场(SDF)之间实现隐式的覆盖关系，避免不同服装表面与人体之间的自相交。与最先进的方法相比，实验证明了我们提出的LGN框架在生成多层服装方面的优势。据我们所知，LGN是第一个从单幅图像中生成人体无交叉多层服装的研究工作。

{"title":"Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image","authors":"Alakh Aggarwal, Ji-kai Wang, S. Hogue, Saifeng Ni, M. Budagavi, Xiaohu Guo","doi":"10.48550/arXiv.2211.11931","DOIUrl":"https://doi.org/10.48550/arXiv.2211.11931","url":null,"abstract":"Recent research works have focused on generating human models and garments from their 2D images. However, state-of-the-art researches focus either on only a single layer of the garment on a human model or on generating multiple garment layers without any guarantee of the intersection-free geometric relationship between them. In reality, people wear multiple layers of garments in their daily life, where an inner layer of garment could be partially covered by an outer one. In this paper, we try to address this multi-layer modeling problem and propose the Layered-Garment Net (LGN) that is capable of generating intersection-free multiple layers of garments defined by implicit function fields over the body surface, given the person's near front-view image. With a special design of garment indication fields (GIF), we can enforce an implicit covering relationship between the signed distance fields (SDF) of different layers to avoid self-intersections among different garment surfaces and the human body. Experiments demonstrate the strength of our proposed LGN framework in generating multi-layer garments as compared to state-of-the-art methods. To the best of our knowledge, LGN is the first research work to generate intersection-free multiple layers of garments on the human body from a single image.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81179754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

RDRN: Recursively Defined Residual Network for Image Super-Resolution 图像超分辨率的递归残差网络

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-17 DOI: 10.48550/arXiv.2211.09462

Alexander Panaetov, Karim Elhadji Daou, Igor Samenko, Evgeny Tetin, Ilya A Ivanov

Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution (SISR). However, very deep networks can suffer from training difficulty and hardly achieve further performance gain. There are two main trends to solve that problem: improving the network architecture for better propagation of features through large number of layers and designing an attention mechanism for selecting most informative features. Recent SISR solutions propose advanced attention and self-attention mechanisms. However, constructing a network to use an attention block in the most efficient way is a challenging problem. To address this issue, we propose a general recursively defined residual block (RDRB) for better feature extraction and propagation through network layers. Based on RDRB we designed recursively defined residual network (RDRN), a novel network architecture which utilizes attention blocks efficiently. Extensive experiments show that the proposed model achieves state-of-the-art results on several popular super-resolution benchmarks and outperforms previous methods by up to 0.43 dB.

深度卷积神经网络(cnn)在单幅图像超分辨率(SISR)方面取得了显著的成绩。然而，非常深的网络可能会受到训练困难的影响，很难获得进一步的性能提升。有两个主要趋势可以解决这个问题:改进网络架构，以便通过大量层更好地传播特征;设计一种关注机制，以选择最具信息量的特征。最近的SISR解决方案提出了高级注意和自注意机制。然而，构建一个网络，以最有效的方式使用注意块是一个具有挑战性的问题。为了解决这个问题，我们提出了一种通用递归定义残差块(RDRB)，以便更好地提取特征并在网络层中传播。在RDRB的基础上，设计了递归定义残差网络(RDRN)，这是一种有效利用注意块的网络结构。大量的实验表明，所提出的模型在几个流行的超分辨率基准测试中取得了最先进的结果，并且比以前的方法高出0.43 dB。

{"title":"RDRN: Recursively Defined Residual Network for Image Super-Resolution","authors":"Alexander Panaetov, Karim Elhadji Daou, Igor Samenko, Evgeny Tetin, Ilya A Ivanov","doi":"10.48550/arXiv.2211.09462","DOIUrl":"https://doi.org/10.48550/arXiv.2211.09462","url":null,"abstract":"Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution (SISR). However, very deep networks can suffer from training difficulty and hardly achieve further performance gain. There are two main trends to solve that problem: improving the network architecture for better propagation of features through large number of layers and designing an attention mechanism for selecting most informative features. Recent SISR solutions propose advanced attention and self-attention mechanisms. However, constructing a network to use an attention block in the most efficient way is a challenging problem. To address this issue, we propose a general recursively defined residual block (RDRB) for better feature extraction and propagation through network layers. Based on RDRB we designed recursively defined residual network (RDRN), a novel network architecture which utilizes attention blocks efficiently. Extensive experiments show that the proposed model achieves state-of-the-art results on several popular super-resolution benchmarks and outperforms previous methods by up to 0.43 dB.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89410897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval 少射度量学习:在线自适应嵌入检索

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07116

Deunsol Jung, Dahyun Kang, Suha Kwak, Minsu Cho

Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space. Despite recent advances in deep metric learning, it remains challenging for the learned metric to generalize to unseen classes with a substantial domain gap. To tackle the issue, we explore a new problem of few-shot metric learning that aims to adapt the embedding function to the target domain with only a few annotated data. We introduce three few-shot metric learning baselines and propose the Channel-Rectifier Meta-Learning (CRML), which effectively adapts the metric space online by adjusting channels of intermediate layers. Experimental analyses on miniImageNet, CUB-200-2011, MPII, as well as a new dataset, miniDeepFashion, demonstrate that our method consistently improves the learned metric by adapting it to target classes and achieves a greater gain in image retrieval when the domain gap from the source classes is larger.

度量学习的目的是建立一个距离度量，通常通过学习一个有效的嵌入函数，将相似的对象映射到其嵌入空间中的附近点。尽管深度度量学习最近取得了进展，但将学习到的度量推广到具有大量领域差距的未知类仍然具有挑战性。为了解决这一问题，我们探索了一种新的小样本度量学习问题，其目的是在只有少量注释数据的情况下使嵌入函数适应目标域。我们引入了三个少镜头度量学习基线，并提出了通道整流元学习(Channel-Rectifier Meta-Learning, CRML)，它通过调整中间层的通道来有效地在线适应度量空间。在miniImageNet、CUB-200-2011、MPII以及新数据集miniDeepFashion上的实验分析表明，我们的方法通过使学习到的度量适应目标类而不断改进，当与源类的域差较大时，我们的方法在图像检索中获得了更大的增益。

引用次数: 3

Cross-Domain Local Characteristic Enhanced Deepfake Video Detection 跨域局部特征增强深度假视频检测

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-07 DOI: 10.48550/arXiv.2211.03346

Zihan Liu, Hanyi Wang, Shilin Wang

As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.

随着超逼真人脸伪造技术的出现，出于安全考虑，深度伪造检测越来越受到人们的关注。许多检测器在检测未见的操作时无法获得准确的结果，尽管在已知的伪造品上表现出色。在本文中，我们的动机是观察到真实和虚假视频之间的差异是非常微妙和局部的，并且在不同信息域的一些关键面部区域可能存在不一致或不规则性。为此，我们提出了一种新的管道，跨域本地取证(XDLF)，用于更通用的深度假视频检测。在该管道中，提出了一个专门的框架来同时利用空间、频率和时间域的本地伪造模式，从而学习跨域特征来检测伪造。此外，该框架利用人脸的四个高级伪造敏感局部区域来指导模型增强细微的工件并定位潜在的异常。在几个基准数据集上的大量实验证明了我们的方法的令人印象深刻的性能，并且我们在跨数据集泛化方面取得了优于几种最先进方法的优势。我们还通过烧蚀研究了影响其性能的因素，这表明利用跨域局部特征是开发更通用的深度假探测器的一个值得注意的方向。

{"title":"Cross-Domain Local Characteristic Enhanced Deepfake Video Detection","authors":"Zihan Liu, Hanyi Wang, Shilin Wang","doi":"10.48550/arXiv.2211.03346","DOIUrl":"https://doi.org/10.48550/arXiv.2211.03346","url":null,"abstract":"As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89322542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Photorealistic Facial Wrinkles Removal 逼真的面部皱纹去除

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-03 DOI: 10.48550/arXiv.2211.01930

Marcelo Sanchez, G. Triginer, C. Ballester, Lara Raad, Eduard Ramon

Editing and retouching facial attributes is a complex task that usually requires human artists to obtain photo-realistic results. Its applications are numerous and can be found in several contexts such as cosmetics or digital media retouching, to name a few. Recently, advancements in conditional generative modeling have shown astonishing results at modifying facial attributes in a realistic manner. However, current methods are still prone to artifacts, and focus on modifying global attributes like age and gender, or local mid-sized attributes like glasses or moustaches. In this work, we revisit a two-stage approach for retouching facial wrinkles and obtain results with unprecedented realism. First, a state of the art wrinkle segmentation network is used to detect the wrinkles within the facial region. Then, an inpainting module is used to remove the detected wrinkles, filling them in with a texture that is statistically consistent with the surrounding skin. To achieve this, we introduce a novel loss term that reuses the wrinkle segmentation network to penalize those regions that still contain wrinkles after the inpainting. We evaluate our method qualitatively and quantitatively, showing state of the art results for the task of wrinkle removal. Moreover, we introduce the first high-resolution dataset, named FFHQ-Wrinkles, to evaluate wrinkle detection methods.

编辑和修饰面部属性是一项复杂的任务，通常需要人类艺术家才能获得逼真的效果。它的应用非常广泛，可以在化妆品或数字媒体修饰等多个环境中找到。最近，条件生成建模的进展在以逼真的方式修改面部属性方面显示出惊人的结果。然而，目前的方法仍然倾向于人工制品，并且专注于修改全局属性，如年龄和性别，或者局部中等大小的属性，如眼镜或胡须。在这项工作中，我们重新审视了一种两阶段的方法来修饰面部皱纹，并获得了前所未有的现实主义效果。首先，使用最先进的皱纹分割网络来检测面部区域内的皱纹。然后，使用inpaint模块去除检测到的皱纹，用与周围皮肤统计一致的纹理填充它们。为了实现这一目标，我们引入了一种新的损失项，它重用皱纹分割网络来惩罚那些在修补后仍然包含皱纹的区域。我们定性和定量地评估我们的方法，显示最先进的除皱任务的结果。此外，我们引入了第一个高分辨率数据集ffhq - wrinkle，以评估皱纹检测方法。

{"title":"Photorealistic Facial Wrinkles Removal","authors":"Marcelo Sanchez, G. Triginer, C. Ballester, Lara Raad, Eduard Ramon","doi":"10.48550/arXiv.2211.01930","DOIUrl":"https://doi.org/10.48550/arXiv.2211.01930","url":null,"abstract":"Editing and retouching facial attributes is a complex task that usually requires human artists to obtain photo-realistic results. Its applications are numerous and can be found in several contexts such as cosmetics or digital media retouching, to name a few. Recently, advancements in conditional generative modeling have shown astonishing results at modifying facial attributes in a realistic manner. However, current methods are still prone to artifacts, and focus on modifying global attributes like age and gender, or local mid-sized attributes like glasses or moustaches. In this work, we revisit a two-stage approach for retouching facial wrinkles and obtain results with unprecedented realism. First, a state of the art wrinkle segmentation network is used to detect the wrinkles within the facial region. Then, an inpainting module is used to remove the detected wrinkles, filling them in with a texture that is statistically consistent with the surrounding skin. To achieve this, we introduce a novel loss term that reuses the wrinkle segmentation network to penalize those regions that still contain wrinkles after the inpainting. We evaluate our method qualitatively and quantitatively, showing state of the art results for the task of wrinkle removal. Moreover, we introduce the first high-resolution dataset, named FFHQ-Wrinkles, to evaluate wrinkle detection methods.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82871033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation 面向类感知和类不可知的少镜头分割对齐联合框架

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-11-02 DOI: 10.48550/arXiv.2211.01310

Kai Huang, Mingfei Cheng, Yang Wang, Bochen Wang, Ye Xi, Fei Wang, Peng Chen

Few-shot segmentation (FSS) aims to segment objects of unseen classes given only a few annotated support images. Most existing methods simply stitch query features with independent support prototypes and segment the query image by feeding the mixed features to a decoder. Although significant improvements have been achieved, existing methods are still face class biases due to class variants and background confusion. In this paper, we propose a joint framework that combines more valuable class-aware and class-agnostic alignment guidance to facilitate the segmentation. Specifically, we design a hybrid alignment module which establishes multi-scale query-support correspondences to mine the most relevant class-aware information for each query image from the corresponding support features. In addition, we explore utilizing base-classes knowledge to generate class-agnostic prior mask which makes a distinction between real background and foreground by highlighting all object regions, especially those of unseen classes. By jointly aggregating class-aware and class-agnostic alignment guidance, better segmentation performances are obtained on query images. Extensive experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed joint framework performs better, especially on the 1-shot setting.

少镜头分割(FSS)的目的是在只给定少量注释的支持图像的情况下，分割不可见类的对象。大多数现有的方法都是简单地将查询特征与独立的支持原型拼接在一起，然后将混合特征馈送给解码器对查询图像进行分割。虽然已经取得了很大的进步，但是由于类的变化和背景的混淆，现有的方法仍然面临着类偏差。在本文中，我们提出了一个联合框架，它结合了更有价值的类感知和类不可知的对齐指导来促进分割。具体来说，我们设计了一个混合对齐模块，该模块建立了多尺度查询支持对应关系，从相应的支持特征中挖掘每个查询图像最相关的类感知信息。此外，我们探索利用基类知识来生成与类无关的先验掩码，该掩码通过突出显示所有对象区域，特别是那些看不见的类，来区分真实背景和前景。通过类感知和类不可知对齐引导的联合聚合，可以获得更好的查询图像分割性能。在PASCAL-$5^i$和COCO-$20^i$数据集上的大量实验表明，我们提出的联合框架性能更好，特别是在单镜头设置上。

{"title":"A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation","authors":"Kai Huang, Mingfei Cheng, Yang Wang, Bochen Wang, Ye Xi, Fei Wang, Peng Chen","doi":"10.48550/arXiv.2211.01310","DOIUrl":"https://doi.org/10.48550/arXiv.2211.01310","url":null,"abstract":"Few-shot segmentation (FSS) aims to segment objects of unseen classes given only a few annotated support images. Most existing methods simply stitch query features with independent support prototypes and segment the query image by feeding the mixed features to a decoder. Although significant improvements have been achieved, existing methods are still face class biases due to class variants and background confusion. In this paper, we propose a joint framework that combines more valuable class-aware and class-agnostic alignment guidance to facilitate the segmentation. Specifically, we design a hybrid alignment module which establishes multi-scale query-support correspondences to mine the most relevant class-aware information for each query image from the corresponding support features. In addition, we explore utilizing base-classes knowledge to generate class-agnostic prior mask which makes a distinction between real background and foreground by highlighting all object regions, especially those of unseen classes. By jointly aggregating class-aware and class-agnostic alignment guidance, better segmentation performances are obtained on query images. Extensive experiments on PASCAL-$5^i$ and COCO-$20^i$ datasets demonstrate that our proposed joint framework performs better, especially on the 1-shot setting.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81280175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

BOREx: Bayesian-Optimization-Based Refinement of Saliency Map for Image- and Video-Classification Models BOREx:基于贝叶斯优化的图像和视频分类模型的显著性映射改进

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-10-31 DOI: 10.48550/arXiv.2210.17130

Atsushi Kikuchi, Kotaro Uchida, Masaki Waga, Kohei Suenaga

Explaining a classification result produced by an image- and video-classification model is one of the important but challenging issues in computer vision. Many methods have been proposed for producing heat-map--based explanations for this purpose, including ones based on the white-box approach that uses the internal information of a model (e.g., LRP, Grad-CAM, and Grad-CAM++) and ones based on the black-box approach that does not use any internal information (e.g., LIME, SHAP, and RISE). We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method. Our observation is that a heat-map--based explanation can be seen as a prior for an explanation method based on Bayesian optimization. Based on this observation, BOREx conducts Gaussian process regression (GPR) to estimate the saliency of each pixel in a given image starting from the one produced by another explanation method. Our experiments statistically demonstrate that the refinement by BOREx improves low-quality heat maps for image- and video-classification results.

解释由图像和视频分类模型产生的分类结果是计算机视觉中重要而又具有挑战性的问题之一。为了这个目的，已经提出了许多方法来产生基于热图的解释，包括基于使用模型内部信息的白盒方法(例如，LRP, Grad-CAM和Grad-CAM++)和基于不使用任何内部信息的黑盒方法(例如，LIME, SHAP和RISE)。我们提出了一种新的黑盒方法BOREx (Bayesian Optimization for refine of visual model Explanation)来改进任何方法生成的热图。我们的观察是，基于热图的解释可以被视为基于贝叶斯优化的解释方法的先验。基于这一观察结果，BOREx进行高斯过程回归(GPR)，从另一种解释方法产生的像素开始估计给定图像中每个像素的显著性。我们的实验统计表明，BOREx的细化改善了图像和视频分类结果的低质量热图。

{"title":"BOREx: Bayesian-Optimization-Based Refinement of Saliency Map for Image- and Video-Classification Models","authors":"Atsushi Kikuchi, Kotaro Uchida, Masaki Waga, Kohei Suenaga","doi":"10.48550/arXiv.2210.17130","DOIUrl":"https://doi.org/10.48550/arXiv.2210.17130","url":null,"abstract":"Explaining a classification result produced by an image- and video-classification model is one of the important but challenging issues in computer vision. Many methods have been proposed for producing heat-map--based explanations for this purpose, including ones based on the white-box approach that uses the internal information of a model (e.g., LRP, Grad-CAM, and Grad-CAM++) and ones based on the black-box approach that does not use any internal information (e.g., LIME, SHAP, and RISE). We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method. Our observation is that a heat-map--based explanation can be seen as a prior for an explanation method based on Bayesian optimization. Based on this observation, BOREx conducts Gaussian process regression (GPR) to estimate the saliency of each pixel in a given image starting from the one produced by another explanation method. Our experiments statistically demonstrate that the refinement by BOREx improves low-quality heat maps for image- and video-classification results.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87744823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1