Learning Gait Representation from Massive Unlabelled Walking Videos: A Benchmark

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2022-06-28 DOI:10.48550/arXiv.2206.13964

Chao Fan, Saihui Hou, Jilong Wang, Yongzhen Huang, Shiqi Yu

{"title":"Learning Gait Representation from Massive Unlabelled Walking Videos: A Benchmark","authors":"Chao Fan, Saihui Hou, Jilong Wang, Yongzhen Huang, Shiqi Yu","doi":"10.48550/arXiv.2206.13964","DOIUrl":null,"url":null,"abstract":"Gait depicts individuals' unique and distinguishing walking patterns and has become one of the most promising biometric features for human identification. As a fine-grained recognition task, gait recognition is easily affected by many factors and usually requires a large amount of completely annotated data that is costly and insatiable. This paper proposes a large-scale self-supervised benchmark for gait recognition with contrastive learning, aiming to learn the general gait representation from massive unlabelled walking videos for practical applications via offering informative walking priors and diverse real-world variations. Specifically, we collect a large-scale unlabelled gait dataset GaitLU-1M consisting of 1.02M walking sequences and propose a conceptually simple yet empirically powerful baseline model GaitSSB. Experimentally, we evaluate the pre-trained model on four widely-used gait benchmarks, CASIA-B, OU-MVLP, GREW and Gait3D with or without transfer learning. The unsupervised results are comparable to or even better than the early model-based and GEI-based methods. After transfer learning, GaitSSB outperforms existing methods by a large margin in most cases, and also showcases the superior generalization capacity. Further experiments indicate that the pre-training can save about 50% and 80% annotation costs of GREW and Gait3D. Theoretically, we discuss the critical issues for gait-specific contrastive framework and present some insights for further study. As far as we know, GaitLU-1M is the first large-scale unlabelled gait dataset, and GaitSSB is the first method that achieves remarkable unsupervised results on the aforementioned benchmarks. The source code of GaitSSB and anonymous data of GaitLU-1M is available at https://github.com/ShiqiYu/OpenGait.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.13964","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 10

Abstract

Gait depicts individuals' unique and distinguishing walking patterns and has become one of the most promising biometric features for human identification. As a fine-grained recognition task, gait recognition is easily affected by many factors and usually requires a large amount of completely annotated data that is costly and insatiable. This paper proposes a large-scale self-supervised benchmark for gait recognition with contrastive learning, aiming to learn the general gait representation from massive unlabelled walking videos for practical applications via offering informative walking priors and diverse real-world variations. Specifically, we collect a large-scale unlabelled gait dataset GaitLU-1M consisting of 1.02M walking sequences and propose a conceptually simple yet empirically powerful baseline model GaitSSB. Experimentally, we evaluate the pre-trained model on four widely-used gait benchmarks, CASIA-B, OU-MVLP, GREW and Gait3D with or without transfer learning. The unsupervised results are comparable to or even better than the early model-based and GEI-based methods. After transfer learning, GaitSSB outperforms existing methods by a large margin in most cases, and also showcases the superior generalization capacity. Further experiments indicate that the pre-training can save about 50% and 80% annotation costs of GREW and Gait3D. Theoretically, we discuss the critical issues for gait-specific contrastive framework and present some insights for further study. As far as we know, GaitLU-1M is the first large-scale unlabelled gait dataset, and GaitSSB is the first method that achieves remarkable unsupervised results on the aforementioned benchmarks. The source code of GaitSSB and anonymous data of GaitLU-1M is available at https://github.com/ShiqiYu/OpenGait.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从大量未标记的步行视频中学习步态表示:一个基准

步态描绘了个体独特的行走模式，已成为人类识别最有前景的生物特征之一。步态识别作为一项细粒度的识别任务，容易受到多种因素的影响，通常需要大量完全注释的数据，成本高昂，难以满足。本文提出了一种用于对比学习步态识别的大规模自监督基准，旨在通过提供信息丰富的行走先验和不同的真实世界变化，从大量未标记的行走视频中学习通用步态表示，以供实际应用。具体来说，我们收集了一个由1.02M个行走序列组成的大规模未标记步态数据集GaitLU-1M，并提出了一个概念简单但经验强大的基线模型GaitSB。在实验上，我们在四个广泛使用的步态基准上评估了预训练模型，即CASIA-B、OU-MVLP、GREW和Gait3D，无论是否进行迁移学习。无监督的结果与早期基于模型和基于GEI的方法相当，甚至更好。在迁移学习之后，GaitSB在大多数情况下都大大优于现有的方法，并且表现出优越的泛化能力。进一步的实验表明，预训练可以节省GREW和Gait3D约50%和80%的注释成本。从理论上讲，我们讨论了步态特定对比框架的关键问题，并为进一步研究提供了一些见解。据我们所知，GaitLU-1M是第一个大规模的未标记步态数据集，而GaitSB是第一个在上述基准上获得显著无监督结果的方法。GaitSB的源代码和GaitLU-1M的匿名数据可在https://github.com/ShiqiYu/OpenGait.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.

期刊最新文献

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models Stability and Generalization for Distributed SGDA Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey Understanding the Effects of Projectors in Knowledge Distillation Fast and Scalable Hashing-Based Universal Graph Coarsening.