Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision最新文献

英文中文

QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation QS-Craft:学习量化，拼字和工艺条件的人类运动动画

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-03-22 DOI: 10.48550/arXiv.2203.11632

Yuxin Hong, Xuelin Qian, Simian Luo, X. Xue, Yanwei Fu

This paper studies the task of conditional Human Motion Animation (cHMA). Given a source image and a driving video, the model should animate the new frame sequence, in which the person in the source image should perform a similar motion as the pose sequence from the driving video. Despite the success of Generative Adversarial Network (GANs) methods in image and video synthesis, it is still very challenging to conduct cHMA due to the difficulty in efficiently utilizing the conditional guided information such as images or poses, and generating images of good visual quality. To this end, this paper proposes a novel model of learning to Quantize, Scrabble, and Craft (QS-Craft) for conditional human motion animation. The key novelties come from the newly introduced three key steps: quantize, scrabble and craft. Particularly, our QS-Craft employs transformer in its structure to utilize the attention architectures. The guided information is represented as a pose coordinate sequence extracted from the driving videos. Extensive experiments on human motion datasets validate the efficacy of our model.

本文研究了条件人体运动动画(cHMA)的任务。给定源图像和驾驶视频，模型应该动画新帧序列，其中源图像中的人应该执行与驾驶视频中的姿势序列类似的动作。尽管生成对抗网络(Generative Adversarial Network, GANs)方法在图像和视频合成方面取得了成功，但由于难以有效利用图像或姿势等条件引导信息并生成良好视觉质量的图像，因此进行cHMA仍然是非常具有挑战性的。为此，本文提出了一种新的学习量化，拼字和工艺(QS-Craft)模型，用于有条件的人体运动动画。关键的新奇之处来自于新引入的三个关键步骤:量化、拼字和工艺。特别地，我们的QS-Craft在结构上采用了变压器来利用注意力结构。引导信息表示为从驾驶视频中提取的姿态坐标序列。在人体运动数据集上的大量实验验证了我们模型的有效性。

{"title":"QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation","authors":"Yuxin Hong, Xuelin Qian, Simian Luo, X. Xue, Yanwei Fu","doi":"10.48550/arXiv.2203.11632","DOIUrl":"https://doi.org/10.48550/arXiv.2203.11632","url":null,"abstract":"This paper studies the task of conditional Human Motion Animation (cHMA). Given a source image and a driving video, the model should animate the new frame sequence, in which the person in the source image should perform a similar motion as the pose sequence from the driving video. Despite the success of Generative Adversarial Network (GANs) methods in image and video synthesis, it is still very challenging to conduct cHMA due to the difficulty in efficiently utilizing the conditional guided information such as images or poses, and generating images of good visual quality. To this end, this paper proposes a novel model of learning to Quantize, Scrabble, and Craft (QS-Craft) for conditional human motion animation. The key novelties come from the newly introduced three key steps: quantize, scrabble and craft. Particularly, our QS-Craft employs transformer in its structure to utilize the attention architectures. The guided information is represented as a pose coordinate sequence extracted from the driving videos. Extensive experiments on human motion datasets validate the efficacy of our model.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82498044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MatchFormer: Interleaving Attention in Transformers for Feature Matching MatchFormer:用于特征匹配的互感器中的交叉注意

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-03-17 DOI: 10.48550/arXiv.2203.09645

Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, R. Stiefelhagen

Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).

局部特征匹配是一项亚像素级的计算密集型任务。基于检测器的方法结合特征描述符在低纹理场景中表现不佳，而基于cnn的方法采用顺序提取匹配管道，无法利用编码器的匹配能力，而且往往会使解码器的匹配负担过重。相反，我们提出了一种新的分层提取和匹配转换器，称为MatchFormer。在分层编码器的每个阶段，我们将特征提取的自关注和特征匹配的交叉关注交织在一起，产生了一种人类直观的提取和匹配方案。这样的匹配感知编码器释放了过载的解码器，使模型非常高效。此外，在层次结构中结合多尺度特征的自关注和交叉关注可以提高匹配的鲁棒性，特别是在低纹理的室内场景或室外训练数据较少的情况下。由于这样的策略，MatchFormer是一个多赢的解决方案，在效率，鲁棒性和精度。与之前室内姿势估计的最佳方法相比，我们的lite MatchFormer只有45%的GFLOPs，但实现了+1.3%的精度增益和41%的运行速度提升。大型MatchFormer在四个不同的基准上达到了最先进的水平，包括室内姿态估计(ScanNet)，室外姿态估计(MegaDepth)，单应性估计和图像匹配(HPatch)以及视觉定位(InLoc)。

{"title":"MatchFormer: Interleaving Attention in Transformers for Feature Matching","authors":"Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, R. Stiefelhagen","doi":"10.48550/arXiv.2203.09645","DOIUrl":"https://doi.org/10.48550/arXiv.2203.09645","url":null,"abstract":"Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88718066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework GaitStrip:基于有效条带特征表示和多级框架的步态识别

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-03-08 DOI: 10.48550/arXiv.2203.03966

Ming-Zhen Wang, Beibei Lin, Xianda Guo, Lincheng Li, Zhenguo Zhu, Jiande Sun, Shunli Zhang, Xin Yu

Many gait recognition methods first partition the human gait into N-parts and then combine them to establish part-based feature representations. Their gait recognition performance is often affected by partitioning strategies, which are empirically chosen in different datasets. However, we observe that strips as the basic component of parts are agnostic against different partitioning strategies. Motivated by this observation, we present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels. To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the strip-based feature representations by directly taking each strip of the human body as the basic unit. Moreover, we propose a novel multi-branch structure, called Enhanced Convolution Module (ECM), to extract different representations of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network. Specifically, ST aims to extract spatial-temporal features of gait sequences, while FL is used to generate the feature representation of each frame. Second, the parameters of the ECM can be reduced in test by introducing a structural re-parameterization technique. Extensive experimental results demonstrate that our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.

许多步态识别方法首先将人体步态划分为n个部分，然后将它们组合起来，建立基于部分的特征表示。它们的步态识别性能经常受到分区策略的影响，这些策略是根据不同的数据集经验选择的。然而，我们观察到条带作为零件的基本组成部分对不同的划分策略是不可知的。基于这一观察结果，我们提出了一种基于条带的多级步态识别网络，命名为GaitStrip，以提取不同层次的综合步态信息。具体来说，我们的高级分支探索步态序列的背景，我们的低级分支侧重于详细的姿势变化。我们引入了一种新的基于条带的特征提取器(SPB)，直接将人体的每个条带作为基本单元来学习基于条带的特征表示。此外，我们提出了一种新的多分支结构，称为增强卷积模块(ECM)，以提取步态的不同表示。ECM由时空特征提取器(ST)、帧级特征提取器(FL)和SPB组成，具有两个明显的优点:首先，每个分支集中在一个特定的表示上，这可以用来提高网络的鲁棒性。其中，ST用于提取步态序列的时空特征，FL用于生成每一帧的特征表示。其次，通过引入结构重参数化技术，可以在试验中减小电磁对抗的参数。大量的实验结果表明，我们的GaitStrip在正常行走和复杂条件下都达到了最先进的性能。

{"title":"GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework","authors":"Ming-Zhen Wang, Beibei Lin, Xianda Guo, Lincheng Li, Zhenguo Zhu, Jiande Sun, Shunli Zhang, Xin Yu","doi":"10.48550/arXiv.2203.03966","DOIUrl":"https://doi.org/10.48550/arXiv.2203.03966","url":null,"abstract":"Many gait recognition methods first partition the human gait into N-parts and then combine them to establish part-based feature representations. Their gait recognition performance is often affected by partitioning strategies, which are empirically chosen in different datasets. However, we observe that strips as the basic component of parts are agnostic against different partitioning strategies. Motivated by this observation, we present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels. To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes. We introduce a novel StriP-Based feature extractor (SPB) to learn the strip-based feature representations by directly taking each strip of the human body as the basic unit. Moreover, we propose a novel multi-branch structure, called Enhanced Convolution Module (ECM), to extract different representations of gaits. ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network. Specifically, ST aims to extract spatial-temporal features of gait sequences, while FL is used to generate the feature representation of each frame. Second, the parameters of the ECM can be reduced in test by introducing a structural re-parameterization technique. Extensive experimental results demonstrate that our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.","PeriodicalId":87238,"journal":{"name":"Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89715733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Exploring Adversarially Robust Training for Unsupervised Domain Adaptation 探索无监督域自适应的对抗鲁棒训练

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-02-18 DOI: 10.1007/978-3-031-26351-4_34

Shao-Yuan Lo, Vishal M. Patel

引用次数: 2

RA Loss: Relation-Aware Loss for Robust Person Re-identification RA损失:稳健人再识别的关系感知损失

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-26284-5_23

Kan Wang, Shuping Hu, Jun Cheng, Jianxin Pang, Huan Tan

引用次数: 1

Rove-Tree-11: The Not-so-Wild Rover a Hierarchically Structured Image Dataset for Deep Metric Learning Research Rove-Tree-11:不那么狂野的漫游者:用于深度度量学习研究的分层结构图像数据集

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-26348-4_25

R. Hunt, K. S. Pedersen

引用次数: 0

Causal Property Based Anti-conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation 基于因果属性的混合数据增强抗冲突建模无偏场景图生成

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-26316-3_34

Ruonan Zhang, Gaoyun An

引用次数: 1

FAPN: Face Alignment Propagation Network for Face Video Super-Resolution FAPN:面向人脸视频超分辨率的人脸对齐传播网络

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-27066-6_1

Sige Bian, He Li, Fei Yu, Jiyuan Liu, Changjun Song, Yongming Tang

引用次数: 0

Unsupervised Online Hashing with Multi-Bit Quantization 多比特量化的无监督在线哈希

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-26293-7_39

Zhenyu Weng, Yuesheng Zhu

引用次数: 0

Self-Supervised Dehazing Network Using Physical Priors 基于物理先验的自监督除雾网络

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

Pub Date : 2022-01-01 DOI: 10.1007/978-3-031-26313-2_18

Gwangjin Ju, Y. Choi, Donggun Lee, J. Paik, Gyeongha Hwang, Seungyong Lee

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computer vision - ACCV ... : ... Asian Conference on Computer Vision : proceedings. Asian Conference on Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀