Yan Li, Ruiping Wang, Zhen Cui, S. Shan, Xilin Chen
{"title":"Compact Video Code and Its Application to Robust Face Retrieval in TV-Series","authors":"Yan Li, Ruiping Wang, Zhen Cui, S. Shan, Xilin Chen","doi":"10.5244/C.28.93","DOIUrl":null,"url":null,"abstract":"We address the problem of video face retrieval in TV-Series which searches video clips based on the presence of specific character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named Compact Video Code (CVC). Our method first models the video clip by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature, the high-dimensional covariance matrix is further encoded as a much lower-dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Face retrieval experiments on two challenging TV-Series video databases demonstrate the competitiveness of the proposed CVC over state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.","PeriodicalId":278286,"journal":{"name":"Proceedings of the British Machine Vision Conference 2014","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the British Machine Vision Conference 2014","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5244/C.28.93","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
We address the problem of video face retrieval in TV-Series which searches video clips based on the presence of specific character, given one video clip of his/hers. This is tremendously challenging because on one hand, faces in TV-Series are captured in largely uncontrolled conditions with complex appearance variations, and on the other hand retrieval task typically needs efficient representation with low time and space complexity. To handle this problem, we propose a compact and discriminative representation for the huge body of video data, named Compact Video Code (CVC). Our method first models the video clip by its sample (i.e., frame) covariance matrix to capture the video data variations in a statistical manner. To incorporate discriminative information and obtain more compact video signature, the high-dimensional covariance matrix is further encoded as a much lower-dimensional binary vector, which finally yields the proposed CVC. Specifically, each bit of the code, i.e., each dimension of the binary vector, is produced via supervised learning in a max margin framework, which aims to make a balance between the discriminability and stability of the code. Face retrieval experiments on two challenging TV-Series video databases demonstrate the competitiveness of the proposed CVC over state-of-the-art retrieval methods. In addition, as a general video matching algorithm, CVC is also evaluated in traditional video face recognition task on a standard Internet database, i.e., YouTube Celebrities, showing its quite promising performance by using an extremely compact code with only 128 bits.
我们解决了电视剧中的视频人脸检索问题,该问题是基于特定角色的存在来搜索视频片段,给定他/她的一个视频片段。这是一个巨大的挑战,因为一方面,电视连续剧中的人脸是在具有复杂外观变化的很大程度上不受控制的条件下捕获的,另一方面,检索任务通常需要低时间和空间复杂性的有效表示。为了解决这一问题,我们提出了一种用于海量视频数据的紧凑和判别表示,称为紧凑视频代码(compact video Code, CVC)。我们的方法首先通过样本(即帧)协方差矩阵对视频片段进行建模,以统计方式捕捉视频数据的变化。为了包含判别信息并获得更紧凑的视频签名,将高维协方差矩阵进一步编码为低维二值向量,最终得到所提出的CVC。具体来说,代码的每个比特,即二进制向量的每个维度,都是通过最大边际框架中的监督学习产生的,其目的是在代码的可判别性和稳定性之间取得平衡。在两个具有挑战性的电视连续剧视频数据库上进行的人脸检索实验表明,所提出的CVC比最先进的检索方法具有竞争力。此外,作为一种通用的视频匹配算法,CVC还在传统的视频人脸识别任务中,在标准的互联网数据库(即YouTube Celebrities)上进行了评估,通过使用只有128位的极其紧凑的代码,显示了其相当有前景的性能。