Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836048
Zhangyang Wang, Zhaowen Wang, Shiyu Chang, Jianchao Yang, Thomas S. Huang
Existing example-based super resolution (SR) methods are built upon either external-examples or self-examples. Although effective in certain cases, both methods suffer from their inherent limitation. This paper goes beyond these two classes of most common example-based SR approaches, and proposes a novel joint SR perspective. The joint SR exploits and maximizes the complementary advantages of external- and self-example based methods. We elaborate on exploitable priors for image components of different nature, and formulate their corresponding loss functions mathematically. Equipped with that, we construct a unified SR formulation, and propose an iterative joint super resolution (IJSR) algorithm to solve the optimization. Such a joint perspective approach leads to an impressive improvement of SR results both quantitatively and qualitatively.
{"title":"A joint perspective towards image super-resolution: Unifying external- and self-examples","authors":"Zhangyang Wang, Zhaowen Wang, Shiyu Chang, Jianchao Yang, Thomas S. Huang","doi":"10.1109/WACV.2014.6836048","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836048","url":null,"abstract":"Existing example-based super resolution (SR) methods are built upon either external-examples or self-examples. Although effective in certain cases, both methods suffer from their inherent limitation. This paper goes beyond these two classes of most common example-based SR approaches, and proposes a novel joint SR perspective. The joint SR exploits and maximizes the complementary advantages of external- and self-example based methods. We elaborate on exploitable priors for image components of different nature, and formulate their corresponding loss functions mathematically. Equipped with that, we construct a unified SR formulation, and propose an iterative joint super resolution (IJSR) algorithm to solve the optimization. Such a joint perspective approach leads to an impressive improvement of SR results both quantitatively and qualitatively.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"55 1","pages":"596-603"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77366545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836003
Hoan Nguyen, Thomas Fasciano, D. Charbonneau, A. Dornhaus, M. Shin
The tracking of ants in video is important for the analysis of their complex group behavior. However, the manual analysis of these videos is tedious and time consuming. Automated tracking methods tend to drift due to frequent occlusions during their interactions and similarity in appearance. Semi-automated tracking methods enable corrections of tracking errors by incorporating user interaction. Although it is much lower than manual analysis, the required user time of the existing method is still typically 23 times the actual video length. In this paper, we propose a new semi-automated method that achieves similar accuracy while reducing the user interaction time by (1) mitigating user wait time by incorporating a data association tracking method to separate the tracking from user correction, and (2) minimizing the number of candidates visualized for user during correction. This proposed method is able to reduce the user interaction time by 67% while maintaining the accuracy within 3% of the previous semi-automated method [11].
{"title":"Data association based ant tracking with interactive error correction","authors":"Hoan Nguyen, Thomas Fasciano, D. Charbonneau, A. Dornhaus, M. Shin","doi":"10.1109/WACV.2014.6836003","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836003","url":null,"abstract":"The tracking of ants in video is important for the analysis of their complex group behavior. However, the manual analysis of these videos is tedious and time consuming. Automated tracking methods tend to drift due to frequent occlusions during their interactions and similarity in appearance. Semi-automated tracking methods enable corrections of tracking errors by incorporating user interaction. Although it is much lower than manual analysis, the required user time of the existing method is still typically 23 times the actual video length. In this paper, we propose a new semi-automated method that achieves similar accuracy while reducing the user interaction time by (1) mitigating user wait time by incorporating a data association tracking method to separate the tracking from user correction, and (2) minimizing the number of candidates visualized for user during correction. This proposed method is able to reduce the user interaction time by 67% while maintaining the accuracy within 3% of the previous semi-automated method [11].","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"33 1","pages":"941-946"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90699019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836098
Yin Cui, Yongzhou Xiang, Kun Rong, R. Feris, Liangliang Cao
We propose a spatial-color layout feature specially designed for galaxy images. Inspired by findings on galaxy formation and evolution from Astronomy, the proposed feature captures both global and local morphological information of galaxies. In addition, our feature is scale and rotation invariant. By developing a hashing-based approach with the proposed feature, we implemented an efficient galaxy image retrieval system on a dataset with more than 280 thousand galaxy images from the Sloan Digital Sky Survey project. Given a query image, the proposed system can rank-order all galaxies from the dataset according to relevance in only 35 milliseconds on a single PC. To the best of our knowledge, this is one of the first works on galaxy-specific feature design and large-scale galaxy image retrieval. We evaluated the performance of the proposed feature and the galaxy image retrieval system using web user annotations, showing that the proposed feature outperforms other classic features, including HOG, Gist, LBP, and Color-histograms. The success of our retrieval system demonstrates the advantages of leveraging computer vision techniques in Astronomy problems.
{"title":"A spatial-color layout feature for representing galaxy images","authors":"Yin Cui, Yongzhou Xiang, Kun Rong, R. Feris, Liangliang Cao","doi":"10.1109/WACV.2014.6836098","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836098","url":null,"abstract":"We propose a spatial-color layout feature specially designed for galaxy images. Inspired by findings on galaxy formation and evolution from Astronomy, the proposed feature captures both global and local morphological information of galaxies. In addition, our feature is scale and rotation invariant. By developing a hashing-based approach with the proposed feature, we implemented an efficient galaxy image retrieval system on a dataset with more than 280 thousand galaxy images from the Sloan Digital Sky Survey project. Given a query image, the proposed system can rank-order all galaxies from the dataset according to relevance in only 35 milliseconds on a single PC. To the best of our knowledge, this is one of the first works on galaxy-specific feature design and large-scale galaxy image retrieval. We evaluated the performance of the proposed feature and the galaxy image retrieval system using web user annotations, showing that the proposed feature outperforms other classic features, including HOG, Gist, LBP, and Color-histograms. The success of our retrieval system demonstrates the advantages of leveraging computer vision techniques in Astronomy problems.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"66 1","pages":"213-219"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91133205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6835999
Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn
We introduce a method for extracting the social network structure for the persons appearing in a set of video clips. Individuals are unknown, and are not matched against known enrollments. An identity cluster representing an individual is formed by grouping similar-appearing faces from different videos. Each identity cluster is represented by a node in the social network. Two nodes are linked if the faces from their clusters appeared together in one or more video frames. Our approach incorporates a novel active clustering technique to create more accurate identity clusters based on feedback from the user about ambiguously matched faces. The final output consists of one or more network structures that represent the social group(s), and a list of persons who potentially connect multiple social groups. Our results demonstrate the efficacy of the proposed clustering algorithm and network analysis techniques.
{"title":"Active Clustering with Ensembles for Social structure extraction","authors":"Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn","doi":"10.1109/WACV.2014.6835999","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835999","url":null,"abstract":"We introduce a method for extracting the social network structure for the persons appearing in a set of video clips. Individuals are unknown, and are not matched against known enrollments. An identity cluster representing an individual is formed by grouping similar-appearing faces from different videos. Each identity cluster is represented by a node in the social network. Two nodes are linked if the faces from their clusters appeared together in one or more video frames. Our approach incorporates a novel active clustering technique to create more accurate identity clusters based on feedback from the user about ambiguously matched faces. The final output consists of one or more network structures that represent the social group(s), and a list of persons who potentially connect multiple social groups. Our results demonstrate the efficacy of the proposed clustering algorithm and network analysis techniques.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"9 1","pages":"969-976"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85994711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836023
Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis
We propose an interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues. User supervision is incorporated in a superpixel graph clustering framework that differs crucially from prior art in that it modifies the graph according to the output of an occlusion boundary detector. Working with long temporal intervals (up to 100 frames) enables our system to significantly reduce annotation effort with respect to state of the art systems. Even though the segmentation results are less than perfect, they are obtained efficiently and can be used in weakly supervised learning from video or for video content description. We do not rely on a discriminative object appearance model and allow extracting multiple foreground objects together, saving user time if more than one object is present. Additional experiments with unsupervised clustering based on occlusion boundaries demonstrate the importance of this cue for video segmentation and thus validate our system design.
{"title":"Interactive video segmentation using occlusion boundaries and temporally coherent superpixels","authors":"Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis","doi":"10.1109/WACV.2014.6836023","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836023","url":null,"abstract":"We propose an interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues. User supervision is incorporated in a superpixel graph clustering framework that differs crucially from prior art in that it modifies the graph according to the output of an occlusion boundary detector. Working with long temporal intervals (up to 100 frames) enables our system to significantly reduce annotation effort with respect to state of the art systems. Even though the segmentation results are less than perfect, they are obtained efficiently and can be used in weakly supervised learning from video or for video content description. We do not rely on a discriminative object appearance model and allow extracting multiple foreground objects together, saving user time if more than one object is present. Additional experiments with unsupervised clustering based on occlusion boundaries demonstrate the importance of this cue for video segmentation and thus validate our system design.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"180 1","pages":"784-791"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88468919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6835986
M. Mohammadi, E. Fatemizadeh, M. Mahoor
Automatic recognition of facial expression and facial identity from visual data are two challenging problems that are tied together. In the past decade, researchers have mostly tried to solve these two problems separately to come up with face identification systems that are expression-independent and facial expressions recognition systems that are person-independent. This paper presents a new framework using sparse representation for simultaneous recognition of facial expression and identity. Our framework is based on the assumption that any facial appearance is a sparse combination of identities and expressions (i.e., one identity and one expression). Our experimental results using the CK+ and MMI face datasets show that the proposed approach outperforms methods that conduct face identification and face recognition individually.
{"title":"Simultaneous recognition of facial expression and identity via sparse representation","authors":"M. Mohammadi, E. Fatemizadeh, M. Mahoor","doi":"10.1109/WACV.2014.6835986","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835986","url":null,"abstract":"Automatic recognition of facial expression and facial identity from visual data are two challenging problems that are tied together. In the past decade, researchers have mostly tried to solve these two problems separately to come up with face identification systems that are expression-independent and facial expressions recognition systems that are person-independent. This paper presents a new framework using sparse representation for simultaneous recognition of facial expression and identity. Our framework is based on the assumption that any facial appearance is a sparse combination of identities and expressions (i.e., one identity and one expression). Our experimental results using the CK+ and MMI face datasets show that the proposed approach outperforms methods that conduct face identification and face recognition individually.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"8 1","pages":"1066-1073"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82892709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836106
Yibing Song, Linchao Bao, Qingxiong Yang
This paper presents a real-time decolorization method. Given the human visual systems preference for luminance information, the luminance should be preserved as much as possible during decolorization. As a result, the proposed decolorization method measures the amount of color contrast/detail lost when converting color to luminance. The detail loss is estimated by computing the difference between two intermediate images: one obtained by applying bilateral filter to the original color image, and the other obtained by applying joint bilateral filter to the original color image with its luminance as the guidance image. The estimated detail loss is then mapped to a grayscale image named residual image by minimizing the difference between the image gradients of the input color image and the objective grayscale image that is the sum of the residual image and the luminance. Apparently, the residual image will contain pixels with all zero values (that is the two intermediate images will be the same) only when no visual detail is missing in the luminance. Unlike most previous methods, the proposed decolorization method preserves both contrast in the color image and the luminance. Quantitative evaluation shows that it is the top performer on the standard test suite. Meanwhile it is very robust and can be directly used to convert videos while maintaining the temporal coherence. Specifically it can convert a high-resolution video (1280 × 720) in real time (about 28 Hz) on a 3.4 GHz i7 CPU.
{"title":"Real-time video decolorization using bilateral filtering","authors":"Yibing Song, Linchao Bao, Qingxiong Yang","doi":"10.1109/WACV.2014.6836106","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836106","url":null,"abstract":"This paper presents a real-time decolorization method. Given the human visual systems preference for luminance information, the luminance should be preserved as much as possible during decolorization. As a result, the proposed decolorization method measures the amount of color contrast/detail lost when converting color to luminance. The detail loss is estimated by computing the difference between two intermediate images: one obtained by applying bilateral filter to the original color image, and the other obtained by applying joint bilateral filter to the original color image with its luminance as the guidance image. The estimated detail loss is then mapped to a grayscale image named residual image by minimizing the difference between the image gradients of the input color image and the objective grayscale image that is the sum of the residual image and the luminance. Apparently, the residual image will contain pixels with all zero values (that is the two intermediate images will be the same) only when no visual detail is missing in the luminance. Unlike most previous methods, the proposed decolorization method preserves both contrast in the color image and the luminance. Quantitative evaluation shows that it is the top performer on the standard test suite. Meanwhile it is very robust and can be directly used to convert videos while maintaining the temporal coherence. Specifically it can convert a high-resolution video (1280 × 720) in real time (about 28 Hz) on a 3.4 GHz i7 CPU.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"55 1","pages":"159-166"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90052446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836020
Rahul Dutta, B. Draper, J. Beveridge
Handheld videos include unintentional motion (jitter) and often intentional motion (pan and/or zoom). Human viewers prefer to see jitter removed, creating a smoothly moving camera. For video analysis, in contrast, aligning to a fixed stable background is sometimes preferable. This paper presents an algorithm that removes both forms of motion using a novel and efficient way of tracking background points while ignoring moving foreground points. The approach is related to image mosaicing, but the result is a video rather than an enlarged still image. It is also related to multiple object tracking approaches, but simpler since moving objects need not be explicitly tracked. The algorithm presented takes as input a video and returns one or several stabilized videos. Videos are broken into parts when the algorithm detects the background changing and it becomes necessary to fix upon a new background. Our approach assumes the person holding the camera is standing in one place and that objects in motion do not dominate the image. Our algorithm performs better than several previously published approaches when compared on 1,401 handheld videos from the recently released Point-and-Shoot Face Recognition Challenge (PASC). The source code for this algorithm is being made available.
手持视频包括无意的动作(抖动)和经常有意的动作(平移和/或变焦)。人类观众更喜欢看到抖动消除,创造一个平滑移动的相机。相比之下,对于视频分析,对准固定的稳定背景有时更可取。本文提出了一种算法,利用一种新颖而有效的方法来跟踪背景点,同时忽略移动的前景点,从而消除这两种形式的运动。该方法与图像拼接有关,但结果是视频而不是放大的静态图像。它也与多目标跟踪方法有关,但更简单,因为移动对象不需要显式跟踪。该算法以一个视频作为输入,并返回一个或多个稳定的视频。当算法检测到背景变化时,视频被分成几个部分,有必要固定在一个新的背景上。我们的方法假设拿着相机的人站在一个地方,运动的物体不会主导图像。在最近发布的“傻瓜脸识别挑战赛”(Point-and-Shoot Face Recognition Challenge,简称PASC)的1401个手持视频中,我们的算法比之前发表的几种方法表现得更好。这个算法的源代码已经公开了。
{"title":"Video alignment to a common reference","authors":"Rahul Dutta, B. Draper, J. Beveridge","doi":"10.1109/WACV.2014.6836020","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836020","url":null,"abstract":"Handheld videos include unintentional motion (jitter) and often intentional motion (pan and/or zoom). Human viewers prefer to see jitter removed, creating a smoothly moving camera. For video analysis, in contrast, aligning to a fixed stable background is sometimes preferable. This paper presents an algorithm that removes both forms of motion using a novel and efficient way of tracking background points while ignoring moving foreground points. The approach is related to image mosaicing, but the result is a video rather than an enlarged still image. It is also related to multiple object tracking approaches, but simpler since moving objects need not be explicitly tracked. The algorithm presented takes as input a video and returns one or several stabilized videos. Videos are broken into parts when the algorithm detects the background changing and it becomes necessary to fix upon a new background. Our approach assumes the person holding the camera is standing in one place and that objects in motion do not dominate the image. Our algorithm performs better than several previously published approaches when compared on 1,401 handheld videos from the recently released Point-and-Shoot Face Recognition Challenge (PASC). The source code for this algorithm is being made available.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"6 1","pages":"808-815"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78767555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6835988
Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani
AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.
{"title":"AutoCaption: Automatic caption generation for personal photos","authors":"Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani","doi":"10.1109/WACV.2014.6835988","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835988","url":null,"abstract":"AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"34 1","pages":"1050-1057"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79291300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6835736
Mingliang Xue, A. Mian, Wanquan Liu, Ling Li
Facial expressions form a significant part of our nonverbal communications and understanding them is essential for effective human computer interaction. Due to the diversity of facial geometry and expressions, automatic expression recognition is a challenging task. This paper deals with the problem of person-independent facial expression recognition from a single 3D scan. We consider only the 3D shape because facial expressions are mostly encoded in facial geometry deformations rather than textures. Unlike the majority of existing works, our method is fully automatic including the detection of landmarks. We detect the four eye corners and nose tip in real time on the depth image and its gradients using Haar-like features and AdaBoost classifier. From these five points, another 25 heuristic points are defined to extract local depth features for representing facial expressions. The depth features are projected to a lower dimensional linear subspace where feature selection is performed by maximizing their relevance and minimizing their redundancy. The selected features are then used to train a multi-class SVM for the final classification. Experiments on the benchmark BU-3DFE database show that the proposed method outperforms existing automatic techniques, and is comparable even to the approaches using manual landmarks.
{"title":"Fully automatic 3D facial expression recognition using local depth features","authors":"Mingliang Xue, A. Mian, Wanquan Liu, Ling Li","doi":"10.1109/WACV.2014.6835736","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835736","url":null,"abstract":"Facial expressions form a significant part of our nonverbal communications and understanding them is essential for effective human computer interaction. Due to the diversity of facial geometry and expressions, automatic expression recognition is a challenging task. This paper deals with the problem of person-independent facial expression recognition from a single 3D scan. We consider only the 3D shape because facial expressions are mostly encoded in facial geometry deformations rather than textures. Unlike the majority of existing works, our method is fully automatic including the detection of landmarks. We detect the four eye corners and nose tip in real time on the depth image and its gradients using Haar-like features and AdaBoost classifier. From these five points, another 25 heuristic points are defined to extract local depth features for representing facial expressions. The depth features are projected to a lower dimensional linear subspace where feature selection is performed by maximizing their relevance and minimizing their redundancy. The selected features are then used to train a multi-class SVM for the final classification. Experiments on the benchmark BU-3DFE database show that the proposed method outperforms existing automatic techniques, and is comparable even to the approaches using manual landmarks.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"36 1","pages":"1096-1103"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75177414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}