首页 > 最新文献

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision最新文献

英文 中文
A joint perspective towards image super-resolution: Unifying external- and self-examples 图像超分辨的联合视角:外部与自身实例的统一
Zhangyang Wang, Zhaowen Wang, Shiyu Chang, Jianchao Yang, Thomas S. Huang
Existing example-based super resolution (SR) methods are built upon either external-examples or self-examples. Although effective in certain cases, both methods suffer from their inherent limitation. This paper goes beyond these two classes of most common example-based SR approaches, and proposes a novel joint SR perspective. The joint SR exploits and maximizes the complementary advantages of external- and self-example based methods. We elaborate on exploitable priors for image components of different nature, and formulate their corresponding loss functions mathematically. Equipped with that, we construct a unified SR formulation, and propose an iterative joint super resolution (IJSR) algorithm to solve the optimization. Such a joint perspective approach leads to an impressive improvement of SR results both quantitatively and qualitatively.
现有的基于实例的超分辨率(SR)方法要么建立在外部实例上,要么建立在自身实例上。虽然在某些情况下有效,但这两种方法都有其固有的局限性。本文超越了这两类最常见的基于实例的SR方法,并提出了一种新的联合SR视角。联合SR利用并最大化了基于外部和自我示例的方法的互补优势。我们详细阐述了不同性质的图像分量的可利用先验,并用数学方法给出了它们对应的损失函数。在此基础上,构建了统一的联合超分辨率公式,并提出了一种迭代联合超分辨率(IJSR)算法来求解优化问题。这样的联合视角方法在数量上和质量上都带来了令人印象深刻的SR结果改进。
{"title":"A joint perspective towards image super-resolution: Unifying external- and self-examples","authors":"Zhangyang Wang, Zhaowen Wang, Shiyu Chang, Jianchao Yang, Thomas S. Huang","doi":"10.1109/WACV.2014.6836048","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836048","url":null,"abstract":"Existing example-based super resolution (SR) methods are built upon either external-examples or self-examples. Although effective in certain cases, both methods suffer from their inherent limitation. This paper goes beyond these two classes of most common example-based SR approaches, and proposes a novel joint SR perspective. The joint SR exploits and maximizes the complementary advantages of external- and self-example based methods. We elaborate on exploitable priors for image components of different nature, and formulate their corresponding loss functions mathematically. Equipped with that, we construct a unified SR formulation, and propose an iterative joint super resolution (IJSR) algorithm to solve the optimization. Such a joint perspective approach leads to an impressive improvement of SR results both quantitatively and qualitatively.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77366545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Data association based ant tracking with interactive error correction 基于数据关联的交互式纠错蚁群跟踪
Hoan Nguyen, Thomas Fasciano, D. Charbonneau, A. Dornhaus, M. Shin
The tracking of ants in video is important for the analysis of their complex group behavior. However, the manual analysis of these videos is tedious and time consuming. Automated tracking methods tend to drift due to frequent occlusions during their interactions and similarity in appearance. Semi-automated tracking methods enable corrections of tracking errors by incorporating user interaction. Although it is much lower than manual analysis, the required user time of the existing method is still typically 23 times the actual video length. In this paper, we propose a new semi-automated method that achieves similar accuracy while reducing the user interaction time by (1) mitigating user wait time by incorporating a data association tracking method to separate the tracking from user correction, and (2) minimizing the number of candidates visualized for user during correction. This proposed method is able to reduce the user interaction time by 67% while maintaining the accuracy within 3% of the previous semi-automated method [11].
视频中蚂蚁的跟踪对分析蚂蚁复杂的群体行为具有重要意义。然而,手工分析这些视频既繁琐又耗时。由于相互作用过程中频繁的咬合和外观上的相似性,自动跟踪方法容易产生漂移。半自动跟踪方法可以通过合并用户交互来纠正跟踪错误。虽然比人工分析低得多,但现有方法所需的用户时间通常仍然是实际视频长度的23倍。在本文中,我们提出了一种新的半自动化方法,在实现类似精度的同时,通过以下方式减少用户交互时间:(1)通过合并数据关联跟踪方法来减少用户等待时间,从而将跟踪与用户更正分开;(2)在更正期间最小化为用户可视化的候选数量。该方法能够将用户交互时间减少67%,同时将精度保持在先前半自动方法[11]的3%以内。
{"title":"Data association based ant tracking with interactive error correction","authors":"Hoan Nguyen, Thomas Fasciano, D. Charbonneau, A. Dornhaus, M. Shin","doi":"10.1109/WACV.2014.6836003","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836003","url":null,"abstract":"The tracking of ants in video is important for the analysis of their complex group behavior. However, the manual analysis of these videos is tedious and time consuming. Automated tracking methods tend to drift due to frequent occlusions during their interactions and similarity in appearance. Semi-automated tracking methods enable corrections of tracking errors by incorporating user interaction. Although it is much lower than manual analysis, the required user time of the existing method is still typically 23 times the actual video length. In this paper, we propose a new semi-automated method that achieves similar accuracy while reducing the user interaction time by (1) mitigating user wait time by incorporating a data association tracking method to separate the tracking from user correction, and (2) minimizing the number of candidates visualized for user during correction. This proposed method is able to reduce the user interaction time by 67% while maintaining the accuracy within 3% of the previous semi-automated method [11].","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90699019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A spatial-color layout feature for representing galaxy images 用于表示星系图像的空间颜色布局特性
Yin Cui, Yongzhou Xiang, Kun Rong, R. Feris, Liangliang Cao
We propose a spatial-color layout feature specially designed for galaxy images. Inspired by findings on galaxy formation and evolution from Astronomy, the proposed feature captures both global and local morphological information of galaxies. In addition, our feature is scale and rotation invariant. By developing a hashing-based approach with the proposed feature, we implemented an efficient galaxy image retrieval system on a dataset with more than 280 thousand galaxy images from the Sloan Digital Sky Survey project. Given a query image, the proposed system can rank-order all galaxies from the dataset according to relevance in only 35 milliseconds on a single PC. To the best of our knowledge, this is one of the first works on galaxy-specific feature design and large-scale galaxy image retrieval. We evaluated the performance of the proposed feature and the galaxy image retrieval system using web user annotations, showing that the proposed feature outperforms other classic features, including HOG, Gist, LBP, and Color-histograms. The success of our retrieval system demonstrates the advantages of leveraging computer vision techniques in Astronomy problems.
我们提出了一种专门为星系图像设计的空间色彩布局特性。受天文学关于星系形成和演化的发现的启发,提出的特征捕获了星系的全局和局部形态信息。此外,我们的特征是缩放和旋转不变的。通过开发基于哈希的方法和所提出的特征,我们实现了一个高效的星系图像检索系统,该系统包含来自斯隆数字巡天项目的28万多张星系图像。给定一个查询图像,该系统可以在单个PC上仅用35毫秒就可以根据相关性对数据集中的所有星系进行排序。据我们所知,这是第一批针对星系特征设计和大规模星系图像检索的工作之一。我们使用web用户注释评估了所提出的特征和星系图像检索系统的性能,表明所提出的特征优于其他经典特征,包括HOG, Gist, LBP和color直方图。我们的检索系统的成功展示了利用计算机视觉技术解决天文学问题的优势。
{"title":"A spatial-color layout feature for representing galaxy images","authors":"Yin Cui, Yongzhou Xiang, Kun Rong, R. Feris, Liangliang Cao","doi":"10.1109/WACV.2014.6836098","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836098","url":null,"abstract":"We propose a spatial-color layout feature specially designed for galaxy images. Inspired by findings on galaxy formation and evolution from Astronomy, the proposed feature captures both global and local morphological information of galaxies. In addition, our feature is scale and rotation invariant. By developing a hashing-based approach with the proposed feature, we implemented an efficient galaxy image retrieval system on a dataset with more than 280 thousand galaxy images from the Sloan Digital Sky Survey project. Given a query image, the proposed system can rank-order all galaxies from the dataset according to relevance in only 35 milliseconds on a single PC. To the best of our knowledge, this is one of the first works on galaxy-specific feature design and large-scale galaxy image retrieval. We evaluated the performance of the proposed feature and the galaxy image retrieval system using web user annotations, showing that the proposed feature outperforms other classic features, including HOG, Gist, LBP, and Color-histograms. The success of our retrieval system demonstrates the advantages of leveraging computer vision techniques in Astronomy problems.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91133205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Active Clustering with Ensembles for Social structure extraction 基于集成的主动聚类社会结构提取
Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn
We introduce a method for extracting the social network structure for the persons appearing in a set of video clips. Individuals are unknown, and are not matched against known enrollments. An identity cluster representing an individual is formed by grouping similar-appearing faces from different videos. Each identity cluster is represented by a node in the social network. Two nodes are linked if the faces from their clusters appeared together in one or more video frames. Our approach incorporates a novel active clustering technique to create more accurate identity clusters based on feedback from the user about ambiguously matched faces. The final output consists of one or more network structures that represent the social group(s), and a list of persons who potentially connect multiple social groups. Our results demonstrate the efficacy of the proposed clustering algorithm and network analysis techniques.
我们介绍了一种提取一组视频片段中出现的人物的社会网络结构的方法。个体是未知的,并且不能与已知的登记进行匹配。通过将来自不同视频的相似面孔分组,形成代表个人的身份集群。每个身份集群由社交网络中的一个节点表示。如果两个节点中的面孔在一个或多个视频帧中一起出现,则两个节点连接在一起。我们的方法结合了一种新颖的主动聚类技术,基于用户对模糊匹配面部的反馈来创建更准确的身份聚类。最终输出包括一个或多个代表社会群体的网络结构,以及可能连接多个社会群体的人员列表。我们的结果证明了所提出的聚类算法和网络分析技术的有效性。
{"title":"Active Clustering with Ensembles for Social structure extraction","authors":"Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn","doi":"10.1109/WACV.2014.6835999","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835999","url":null,"abstract":"We introduce a method for extracting the social network structure for the persons appearing in a set of video clips. Individuals are unknown, and are not matched against known enrollments. An identity cluster representing an individual is formed by grouping similar-appearing faces from different videos. Each identity cluster is represented by a node in the social network. Two nodes are linked if the faces from their clusters appeared together in one or more video frames. Our approach incorporates a novel active clustering technique to create more accurate identity clusters based on feedback from the user about ambiguously matched faces. The final output consists of one or more network structures that represent the social group(s), and a list of persons who potentially connect multiple social groups. Our results demonstrate the efficacy of the proposed clustering algorithm and network analysis techniques.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85994711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Interactive video segmentation using occlusion boundaries and temporally coherent superpixels 使用遮挡边界和时间相干超像素的交互式视频分割
Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis
We propose an interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues. User supervision is incorporated in a superpixel graph clustering framework that differs crucially from prior art in that it modifies the graph according to the output of an occlusion boundary detector. Working with long temporal intervals (up to 100 frames) enables our system to significantly reduce annotation effort with respect to state of the art systems. Even though the segmentation results are less than perfect, they are obtained efficiently and can be used in weakly supervised learning from video or for video content description. We do not rely on a discriminative object appearance model and allow extracting multiple foreground objects together, saving user time if more than one object is present. Additional experiments with unsupervised clustering based on occlusion boundaries demonstrate the importance of this cue for video segmentation and thus validate our system design.
提出了一种基于遮挡和长期时空结构线索的交互式视频分割系统。用户监督被纳入超像素图聚类框架,该框架与现有技术的关键区别在于,它根据遮挡边界检测器的输出修改图。使用较长的时间间隔(最多100帧)使我们的系统能够显著减少相对于当前系统状态的注释工作。尽管分割结果不太完美,但它们是有效的,可以用于视频的弱监督学习或视频内容描述。我们不依赖于区分对象外观模型,并允许同时提取多个前景对象,如果存在多个对象,则节省用户时间。基于遮挡边界的无监督聚类的其他实验证明了该线索对视频分割的重要性,从而验证了我们的系统设计。
{"title":"Interactive video segmentation using occlusion boundaries and temporally coherent superpixels","authors":"Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis","doi":"10.1109/WACV.2014.6836023","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836023","url":null,"abstract":"We propose an interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues. User supervision is incorporated in a superpixel graph clustering framework that differs crucially from prior art in that it modifies the graph according to the output of an occlusion boundary detector. Working with long temporal intervals (up to 100 frames) enables our system to significantly reduce annotation effort with respect to state of the art systems. Even though the segmentation results are less than perfect, they are obtained efficiently and can be used in weakly supervised learning from video or for video content description. We do not rely on a discriminative object appearance model and allow extracting multiple foreground objects together, saving user time if more than one object is present. Additional experiments with unsupervised clustering based on occlusion boundaries demonstrate the importance of this cue for video segmentation and thus validate our system design.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88468919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Simultaneous recognition of facial expression and identity via sparse representation 基于稀疏表示的面部表情和身份的同时识别
M. Mohammadi, E. Fatemizadeh, M. Mahoor
Automatic recognition of facial expression and facial identity from visual data are two challenging problems that are tied together. In the past decade, researchers have mostly tried to solve these two problems separately to come up with face identification systems that are expression-independent and facial expressions recognition systems that are person-independent. This paper presents a new framework using sparse representation for simultaneous recognition of facial expression and identity. Our framework is based on the assumption that any facial appearance is a sparse combination of identities and expressions (i.e., one identity and one expression). Our experimental results using the CK+ and MMI face datasets show that the proposed approach outperforms methods that conduct face identification and face recognition individually.
面部表情的自动识别和面部识别是两个相互关联的难题。在过去的十年里,研究人员大多试图分别解决这两个问题,提出了表情独立的面部识别系统和个人独立的面部表情识别系统。本文提出了一种基于稀疏表示的人脸表情和身份同时识别框架。我们的框架是基于这样的假设,即任何面部外观都是身份和表情的稀疏组合(即一个身份和一个表情)。我们使用CK+和MMI人脸数据集的实验结果表明,该方法优于单独进行人脸识别和人脸识别的方法。
{"title":"Simultaneous recognition of facial expression and identity via sparse representation","authors":"M. Mohammadi, E. Fatemizadeh, M. Mahoor","doi":"10.1109/WACV.2014.6835986","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835986","url":null,"abstract":"Automatic recognition of facial expression and facial identity from visual data are two challenging problems that are tied together. In the past decade, researchers have mostly tried to solve these two problems separately to come up with face identification systems that are expression-independent and facial expressions recognition systems that are person-independent. This paper presents a new framework using sparse representation for simultaneous recognition of facial expression and identity. Our framework is based on the assumption that any facial appearance is a sparse combination of identities and expressions (i.e., one identity and one expression). Our experimental results using the CK+ and MMI face datasets show that the proposed approach outperforms methods that conduct face identification and face recognition individually.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82892709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Real-time video decolorization using bilateral filtering 实时视频脱色使用双边滤波
Yibing Song, Linchao Bao, Qingxiong Yang
This paper presents a real-time decolorization method. Given the human visual systems preference for luminance information, the luminance should be preserved as much as possible during decolorization. As a result, the proposed decolorization method measures the amount of color contrast/detail lost when converting color to luminance. The detail loss is estimated by computing the difference between two intermediate images: one obtained by applying bilateral filter to the original color image, and the other obtained by applying joint bilateral filter to the original color image with its luminance as the guidance image. The estimated detail loss is then mapped to a grayscale image named residual image by minimizing the difference between the image gradients of the input color image and the objective grayscale image that is the sum of the residual image and the luminance. Apparently, the residual image will contain pixels with all zero values (that is the two intermediate images will be the same) only when no visual detail is missing in the luminance. Unlike most previous methods, the proposed decolorization method preserves both contrast in the color image and the luminance. Quantitative evaluation shows that it is the top performer on the standard test suite. Meanwhile it is very robust and can be directly used to convert videos while maintaining the temporal coherence. Specifically it can convert a high-resolution video (1280 × 720) in real time (about 28 Hz) on a 3.4 GHz i7 CPU.
本文提出了一种实时脱色方法。鉴于人类视觉系统对亮度信息的偏好,在脱色过程中应尽可能地保留亮度。因此,所提出的脱色方法测量了将颜色转换为亮度时颜色对比度/细节损失的量。通过计算两幅中间图像之间的差值来估计细节损失,一幅中间图像是对原始彩色图像进行双边滤波得到的,另一幅中间图像是对原始彩色图像进行联合双边滤波得到的,其亮度作为引导图像。然后通过最小化输入彩色图像的图像梯度与客观灰度图像(即残差图像和亮度之和)之间的差值,将估计的细节损失映射到称为残差图像的灰度图像。显然,残差图像将包含像素与所有零值(即两个中间图像将是相同的),只有当没有视觉细节丢失的亮度。与以往大多数方法不同,本文提出的脱色方法既保留了彩色图像的对比度,又保留了亮度。定量评估表明它在标准测试套件中表现最好。同时,它具有很强的鲁棒性,可以直接用于视频转换,同时保持时间相干性。具体来说,它可以在3.4 GHz i7 CPU上实时(约28 Hz)转换高分辨率视频(1280 × 720)。
{"title":"Real-time video decolorization using bilateral filtering","authors":"Yibing Song, Linchao Bao, Qingxiong Yang","doi":"10.1109/WACV.2014.6836106","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836106","url":null,"abstract":"This paper presents a real-time decolorization method. Given the human visual systems preference for luminance information, the luminance should be preserved as much as possible during decolorization. As a result, the proposed decolorization method measures the amount of color contrast/detail lost when converting color to luminance. The detail loss is estimated by computing the difference between two intermediate images: one obtained by applying bilateral filter to the original color image, and the other obtained by applying joint bilateral filter to the original color image with its luminance as the guidance image. The estimated detail loss is then mapped to a grayscale image named residual image by minimizing the difference between the image gradients of the input color image and the objective grayscale image that is the sum of the residual image and the luminance. Apparently, the residual image will contain pixels with all zero values (that is the two intermediate images will be the same) only when no visual detail is missing in the luminance. Unlike most previous methods, the proposed decolorization method preserves both contrast in the color image and the luminance. Quantitative evaluation shows that it is the top performer on the standard test suite. Meanwhile it is very robust and can be directly used to convert videos while maintaining the temporal coherence. Specifically it can convert a high-resolution video (1280 × 720) in real time (about 28 Hz) on a 3.4 GHz i7 CPU.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90052446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Video alignment to a common reference 视频对齐到一个共同的参考
Rahul Dutta, B. Draper, J. Beveridge
Handheld videos include unintentional motion (jitter) and often intentional motion (pan and/or zoom). Human viewers prefer to see jitter removed, creating a smoothly moving camera. For video analysis, in contrast, aligning to a fixed stable background is sometimes preferable. This paper presents an algorithm that removes both forms of motion using a novel and efficient way of tracking background points while ignoring moving foreground points. The approach is related to image mosaicing, but the result is a video rather than an enlarged still image. It is also related to multiple object tracking approaches, but simpler since moving objects need not be explicitly tracked. The algorithm presented takes as input a video and returns one or several stabilized videos. Videos are broken into parts when the algorithm detects the background changing and it becomes necessary to fix upon a new background. Our approach assumes the person holding the camera is standing in one place and that objects in motion do not dominate the image. Our algorithm performs better than several previously published approaches when compared on 1,401 handheld videos from the recently released Point-and-Shoot Face Recognition Challenge (PASC). The source code for this algorithm is being made available.
手持视频包括无意的动作(抖动)和经常有意的动作(平移和/或变焦)。人类观众更喜欢看到抖动消除,创造一个平滑移动的相机。相比之下,对于视频分析,对准固定的稳定背景有时更可取。本文提出了一种算法,利用一种新颖而有效的方法来跟踪背景点,同时忽略移动的前景点,从而消除这两种形式的运动。该方法与图像拼接有关,但结果是视频而不是放大的静态图像。它也与多目标跟踪方法有关,但更简单,因为移动对象不需要显式跟踪。该算法以一个视频作为输入,并返回一个或多个稳定的视频。当算法检测到背景变化时,视频被分成几个部分,有必要固定在一个新的背景上。我们的方法假设拿着相机的人站在一个地方,运动的物体不会主导图像。在最近发布的“傻瓜脸识别挑战赛”(Point-and-Shoot Face Recognition Challenge,简称PASC)的1401个手持视频中,我们的算法比之前发表的几种方法表现得更好。这个算法的源代码已经公开了。
{"title":"Video alignment to a common reference","authors":"Rahul Dutta, B. Draper, J. Beveridge","doi":"10.1109/WACV.2014.6836020","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836020","url":null,"abstract":"Handheld videos include unintentional motion (jitter) and often intentional motion (pan and/or zoom). Human viewers prefer to see jitter removed, creating a smoothly moving camera. For video analysis, in contrast, aligning to a fixed stable background is sometimes preferable. This paper presents an algorithm that removes both forms of motion using a novel and efficient way of tracking background points while ignoring moving foreground points. The approach is related to image mosaicing, but the result is a video rather than an enlarged still image. It is also related to multiple object tracking approaches, but simpler since moving objects need not be explicitly tracked. The algorithm presented takes as input a video and returns one or several stabilized videos. Videos are broken into parts when the algorithm detects the background changing and it becomes necessary to fix upon a new background. Our approach assumes the person holding the camera is standing in one place and that objects in motion do not dominate the image. Our algorithm performs better than several previously published approaches when compared on 1,401 handheld videos from the recently released Point-and-Shoot Face Recognition Challenge (PASC). The source code for this algorithm is being made available.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78767555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
AutoCaption: Automatic caption generation for personal photos AutoCaption:自动生成个人照片的说明文字
Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani
AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.
AutoCaption是一个帮助智能手机用户为他们的照片生成标题的系统。它通过将照片上传到云服务来运行,云服务中应用了许多并行模块来识别各种实体和关系。将模块的输出组合起来生成大量候选字幕,并将其返回给手机。电话客户端包括一个方便的用户界面,允许用户选择他们喜欢的标题、重新排序、添加或删除单词,以获得他们喜欢的语法风格。用户还可以从识别模块返回的多个候选对象中进行选择。
{"title":"AutoCaption: Automatic caption generation for personal photos","authors":"Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani","doi":"10.1109/WACV.2014.6835988","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835988","url":null,"abstract":"AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79291300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Fully automatic 3D facial expression recognition using local depth features 使用局部深度特征的全自动3D面部表情识别
Mingliang Xue, A. Mian, Wanquan Liu, Ling Li
Facial expressions form a significant part of our nonverbal communications and understanding them is essential for effective human computer interaction. Due to the diversity of facial geometry and expressions, automatic expression recognition is a challenging task. This paper deals with the problem of person-independent facial expression recognition from a single 3D scan. We consider only the 3D shape because facial expressions are mostly encoded in facial geometry deformations rather than textures. Unlike the majority of existing works, our method is fully automatic including the detection of landmarks. We detect the four eye corners and nose tip in real time on the depth image and its gradients using Haar-like features and AdaBoost classifier. From these five points, another 25 heuristic points are defined to extract local depth features for representing facial expressions. The depth features are projected to a lower dimensional linear subspace where feature selection is performed by maximizing their relevance and minimizing their redundancy. The selected features are then used to train a multi-class SVM for the final classification. Experiments on the benchmark BU-3DFE database show that the proposed method outperforms existing automatic techniques, and is comparable even to the approaches using manual landmarks.
面部表情是我们非语言交流的重要组成部分,理解它们对于有效的人机交互至关重要。由于面部几何形状和表情的多样性,自动表情识别是一项具有挑战性的任务。本文研究了基于单次三维扫描的人脸独立识别问题。我们只考虑3D形状,因为面部表情主要编码在面部几何变形中,而不是纹理中。与大多数现有作品不同,我们的方法是全自动的,包括地标的检测。我们使用Haar-like feature和AdaBoost分类器在深度图像及其梯度上实时检测四个眼角和鼻尖。从这5个点中,定义另外25个启发式点来提取局部深度特征以表示面部表情。深度特征被投影到一个较低维的线性子空间中,在这个子空间中,特征选择通过最大化它们的相关性和最小化它们的冗余来完成。然后使用选择的特征来训练多类支持向量机以进行最终分类。在基准BU-3DFE数据库上的实验表明,该方法优于现有的自动标记技术,甚至可以与使用手动标记的方法相媲美。
{"title":"Fully automatic 3D facial expression recognition using local depth features","authors":"Mingliang Xue, A. Mian, Wanquan Liu, Ling Li","doi":"10.1109/WACV.2014.6835736","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835736","url":null,"abstract":"Facial expressions form a significant part of our nonverbal communications and understanding them is essential for effective human computer interaction. Due to the diversity of facial geometry and expressions, automatic expression recognition is a challenging task. This paper deals with the problem of person-independent facial expression recognition from a single 3D scan. We consider only the 3D shape because facial expressions are mostly encoded in facial geometry deformations rather than textures. Unlike the majority of existing works, our method is fully automatic including the detection of landmarks. We detect the four eye corners and nose tip in real time on the depth image and its gradients using Haar-like features and AdaBoost classifier. From these five points, another 25 heuristic points are defined to extract local depth features for representing facial expressions. The depth features are projected to a lower dimensional linear subspace where feature selection is performed by maximizing their relevance and minimizing their redundancy. The selected features are then used to train a multi-class SVM for the final classification. Experiments on the benchmark BU-3DFE database show that the proposed method outperforms existing automatic techniques, and is comparable even to the approaches using manual landmarks.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75177414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1