首页 > 最新文献

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops最新文献

英文 中文
Using closed captions to train activity recognizers that improve video retrieval 使用封闭字幕训练活动识别器,提高视频检索
S. Gupta, R. Mooney
Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, rapid camera movements etc. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting `labeled' data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an on-going activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
在现实世界的视频中识别活动是一个困难的问题,由于背景混乱、摄像机角度和变焦的变化、摄像机快速移动等原因而加剧。大型标记视频语料库可用于训练自动活动识别系统,但这需要昂贵的人力和时间。本文探讨了许多视频自然伴随的封闭字幕如何作为弱监督,允许自动收集“标记”数据以进行活动识别。我们证明了这种方法可以提高足球视频的活动检索。我们的系统不需要对视频片段进行人工标记,只需要最少的人工监督。我们还提出了一种新的标题分类器,它使用额外的语言信息来确定特定评论是否指的是正在进行的活动。我们证明了语言分析和自动训练的活动识别器相结合可以显著提高视频检索的精度。
{"title":"Using closed captions to train activity recognizers that improve video retrieval","authors":"S. Gupta, R. Mooney","doi":"10.1109/CVPRW.2009.5204202","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204202","url":null,"abstract":"Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, rapid camera movements etc. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting `labeled' data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an on-going activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134622836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An implicit spatiotemporal shape model for human activity localization and recognition 人类活动定位与识别的隐式时空形状模型
A. Oikonomopoulos, I. Patras, M. Pantic
In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.
在本文中,我们解决了未分割图像序列中人类活动的定位和识别问题。所提出的方法的主要贡献是使用活动的时空形状的隐式表示,它依赖于特征,稀疏,“视觉词”和“视觉动词”的时空定位。在概率时空投票方案中积累了活动时空定位的证据。我们的投票框架的本地特性允许我们恢复发生在同一场景中的多个活动,以及存在混乱和闭塞的活动。我们使用训练集中的描述符构建特定于类的码本,其中我们考虑了码字对的空间共现。码字对相对于对象中心的位置,以及它们在训练集中发生的帧随后被存储,以便创建码字共现的时空模型。在测试阶段,我们使用均值移位模式估计来对每一帧中执行活动的主体进行空间分割,并使用Radon变换来提取关于连续流中活动的时间分割的最可能假设。
{"title":"An implicit spatiotemporal shape model for human activity localization and recognition","authors":"A. Oikonomopoulos, I. Patras, M. Pantic","doi":"10.1109/CVPRW.2009.5204262","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204262","url":null,"abstract":"In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133820322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Robust feature matching in 2.3µs 在2.3µs内实现鲁棒特征匹配
S. Taylor, E. Rosten, T. Drummond
In this paper we present a robust feature matching scheme in which features can be matched in 2.3µs. For a typical task involving 150 features per image, this results in a processing time of 500µs for feature extraction and matching. In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only 44 bytes of memory per feature and allows computation of a dissimilarity score in 20ns. A training phase gives the patch-based features invariance to small viewpoint variations. Larger viewpoint variations are handled by training entirely independent sets of features from different viewpoints. A complete system is presented where a database of around 13,000 features is used to robustly localise a single planar target in just over a millisecond, including all steps from feature detection to model fitting. The resulting system shows comparable robustness to SIFT [8] and Ferns [14] while using a tiny fraction of the processing time, and in the latter case a fraction of the memory as well.
在本文中,我们提出了一种鲁棒的特征匹配方案,该方案可以在2.3µs内匹配特征。对于每张图像涉及150个特征的典型任务,这导致特征提取和匹配的处理时间为500µs。为了实现快速匹配,我们使用基于像素强度直方图的简单特征和基于它们的联合分布的索引方案。特征以一种新颖的位掩码表示方式存储,每个特征只需要44字节的内存,并允许在20ns内计算不同的分数。训练阶段使基于补丁的特征对小的视点变化具有不变性。通过训练来自不同视点的完全独立的特征集来处理较大的视点变化。提出了一个完整的系统,其中使用大约13,000个特征数据库在一毫秒多一点的时间内对单个平面目标进行鲁棒定位,包括从特征检测到模型拟合的所有步骤。所得到的系统显示出与SIFT[8]和Ferns[14]相当的鲁棒性,同时使用很小一部分的处理时间,在后者的情况下也使用一小部分的内存。
{"title":"Robust feature matching in 2.3µs","authors":"S. Taylor, E. Rosten, T. Drummond","doi":"10.1109/CVPRW.2009.5204314","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204314","url":null,"abstract":"In this paper we present a robust feature matching scheme in which features can be matched in 2.3µs. For a typical task involving 150 features per image, this results in a processing time of 500µs for feature extraction and matching. In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only 44 bytes of memory per feature and allows computation of a dissimilarity score in 20ns. A training phase gives the patch-based features invariance to small viewpoint variations. Larger viewpoint variations are handled by training entirely independent sets of features from different viewpoints. A complete system is presented where a database of around 13,000 features is used to robustly localise a single planar target in just over a millisecond, including all steps from feature detection to model fitting. The resulting system shows comparable robustness to SIFT [8] and Ferns [14] while using a tiny fraction of the processing time, and in the latter case a fraction of the memory as well.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125116364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Learning a hierarchical compositional representation of multiple object classes 学习多个对象类的分层组合表示
A. Leonardis
Summary form only given. Visual categorization, recognition, and detection of objects has been an area of active research in the vision community for decades. Ultimately, the goal is to recognize and detect a large number of object classes in images within an acceptable time frame. This problem entangles three highly interconnected issues: the internal object representation which should expand sublinearly with the number of classes, means to learn the representation from a set of images, and an effective inference algorithm that matches the object representation against the representation produced from the scene. In the main part of the talk I will present our framework for learning a hierarchical compositional representation of multiple object classes. Learning is unsupervised, statistical, and is performed bottom-up. The approach takes simple contour fragments and learns their frequent spatial configurations which recursively combine into increasingly more complex and class-specific contour compositions.
只提供摘要形式。几十年来,视觉分类、识别和检测一直是视觉界的一个活跃研究领域。最终的目标是在可接受的时间范围内识别和检测图像中的大量对象类。这个问题涉及三个高度相互关联的问题:内部对象表示,它应该随着类的数量次线性扩展,意味着从一组图像中学习表示,以及一个有效的推理算法,将对象表示与从场景中产生的表示相匹配。在演讲的主要部分,我将介绍我们的框架,用于学习多个对象类的分层组合表示。学习是无监督的、统计的、自下而上的。该方法采用简单的轮廓碎片,并学习它们频繁的空间配置,这些空间配置递归地组合成越来越复杂和特定类别的轮廓组合。
{"title":"Learning a hierarchical compositional representation of multiple object classes","authors":"A. Leonardis","doi":"10.1109/CVPRW.2009.5204332","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204332","url":null,"abstract":"Summary form only given. Visual categorization, recognition, and detection of objects has been an area of active research in the vision community for decades. Ultimately, the goal is to recognize and detect a large number of object classes in images within an acceptable time frame. This problem entangles three highly interconnected issues: the internal object representation which should expand sublinearly with the number of classes, means to learn the representation from a set of images, and an effective inference algorithm that matches the object representation against the representation produced from the scene. In the main part of the talk I will present our framework for learning a hierarchical compositional representation of multiple object classes. Learning is unsupervised, statistical, and is performed bottom-up. The approach takes simple contour fragments and learns their frequent spatial configurations which recursively combine into increasingly more complex and class-specific contour compositions.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134256524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards automated large scale discovery of image families 朝着自动大规模发现图像族的方向发展
M. Aly, P. Welinder, Mario E. Munich, P. Perona
Gathering large collections of images is quite easy nowadays with the advent of image sharing Web sites, such as flickr.com. However, such collections inevitably contain duplicates and highly similar images, what we refer to as image families. Automatic discovery and cataloguing of such similar images in large collections is important for many applications, e.g. image search, image collection visualization, and research purposes among others. In this work, we investigate this problem by thoroughly comparing two broad approaches for measuring image similarity: global vs. local features. We assess their performance as the image collection scales up to over 11,000 images with over 6,300 families. We present our results on three datasets with different statistics, including two new challenging datasets. Moreover, we present a new algorithm to automatically determine the number of families in the collection with promising results.
随着像flickr.com这样的图片共享网站的出现,收集大量的图片变得非常容易。然而,这样的集合不可避免地包含重复和高度相似的图像,我们称之为图像族。在大型集合中自动发现和编目这些相似的图像对许多应用程序都很重要,例如图像搜索,图像集合可视化和研究目的等。在这项工作中,我们通过全面比较测量图像相似性的两种广泛方法来研究这个问题:全局特征与局部特征。当图像收集扩展到超过11,000张图像,超过6,300个家庭时,我们评估了它们的性能。我们在三个具有不同统计数据的数据集上展示了我们的结果,包括两个新的具有挑战性的数据集。此外,我们提出了一种新的算法来自动确定集合中的家庭数量,结果很有希望。
{"title":"Towards automated large scale discovery of image families","authors":"M. Aly, P. Welinder, Mario E. Munich, P. Perona","doi":"10.1109/CVPRW.2009.5204177","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204177","url":null,"abstract":"Gathering large collections of images is quite easy nowadays with the advent of image sharing Web sites, such as flickr.com. However, such collections inevitably contain duplicates and highly similar images, what we refer to as image families. Automatic discovery and cataloguing of such similar images in large collections is important for many applications, e.g. image search, image collection visualization, and research purposes among others. In this work, we investigate this problem by thoroughly comparing two broad approaches for measuring image similarity: global vs. local features. We assess their performance as the image collection scales up to over 11,000 images with over 6,300 families. We present our results on three datasets with different statistics, including two new challenging datasets. Moreover, we present a new algorithm to automatically determine the number of families in the collection with promising results.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132552108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Feature based person detection beyond the visible spectrum 超越可见光谱的基于特征的人检测
K. Jüngling, Michael Arens
One of the main challenges in computer vision is the automatic detection of specific object classes in images. Recent advances of object detection performance in the visible spectrum encourage the application of these approaches to data beyond the visible spectrum. In this paper, we show the applicability of a well known, local-feature based object detector for the case of people detection in thermal data. We adapt the detector to the special conditions of infrared data and show the specifics relevant for feature based object detection. For that, we employ the SURF feature detector and descriptor that is well suited for infrared data. We evaluate the performance of our adapted object detector in the task of person detection in different real-world scenarios where people occur at multiple scales. Finally, we show how this local-feature based detector can be used to recognize specific object parts, i.e., body parts of detected people.
计算机视觉的主要挑战之一是图像中特定对象类别的自动检测。可见光谱中目标检测性能的最新进展鼓励了这些方法在可见光谱以外数据中的应用。在本文中,我们展示了一种众所周知的基于局部特征的目标检测器在热数据中检测人的情况下的适用性。我们使探测器适应红外数据的特殊条件,并显示了基于特征的目标检测的相关细节。为此,我们采用了非常适合红外数据的SURF特征检测器和描述符。我们在不同的现实世界场景中评估了我们的适应对象检测器在人检测任务中的性能,其中人出现在多个尺度上。最后,我们展示了如何使用这种基于局部特征的检测器来识别特定的物体部分,即被检测人的身体部位。
{"title":"Feature based person detection beyond the visible spectrum","authors":"K. Jüngling, Michael Arens","doi":"10.1109/CVPRW.2009.5204085","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204085","url":null,"abstract":"One of the main challenges in computer vision is the automatic detection of specific object classes in images. Recent advances of object detection performance in the visible spectrum encourage the application of these approaches to data beyond the visible spectrum. In this paper, we show the applicability of a well known, local-feature based object detector for the case of people detection in thermal data. We adapt the detector to the special conditions of infrared data and show the specifics relevant for feature based object detection. For that, we employ the SURF feature detector and descriptor that is well suited for infrared data. We evaluate the performance of our adapted object detector in the task of person detection in different real-world scenarios where people occur at multiple scales. Finally, we show how this local-feature based detector can be used to recognize specific object parts, i.e., body parts of detected people.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131565552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Accurate estimation of pulmonary nodule's growth rate in CT images with nonrigid registration and precise nodule detection and segmentation 利用非刚性配准和精确的结节检测与分割,准确估计CT图像中肺结节的生长速度
Yuanjie Zheng, C. Kambhamettu, T. Bauer, K. Steiner
We propose a new tumor growth measure for pulmonary nodules in CT images, which can account for the tumor deformation caused by the inspiration level's difference. It is accomplished with a new nonrigid lung registration process, which can handle the tumor expanding/shrinking problem occurring in many conventional nonrigid registration methods. The accurate nonrigid registration is performed by weighting the matching cost of each voxel, based on the result of a new nodule detection approach and a powerful nodule segmentation algorithm. Comprehensive experiments show the high accuracy of our algorithms and the promising results of our new tumor growth measure.
我们提出了一种新的肺结节CT图像的肿瘤生长测量方法,该方法可以解释由于吸入水平的差异而引起的肿瘤变形。它通过一种新的非刚性肺配准过程来完成,可以解决许多传统非刚性配准方法中出现的肿瘤扩张/缩小问题。基于一种新的结节检测方法和强大的结节分割算法,通过加权每个体素的匹配代价来实现精确的非刚性配准。综合实验表明,我们的算法具有很高的准确性,并且我们的新肿瘤生长测量方法取得了令人鼓舞的结果。
{"title":"Accurate estimation of pulmonary nodule's growth rate in CT images with nonrigid registration and precise nodule detection and segmentation","authors":"Yuanjie Zheng, C. Kambhamettu, T. Bauer, K. Steiner","doi":"10.1109/CVPRW.2009.5204050","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204050","url":null,"abstract":"We propose a new tumor growth measure for pulmonary nodules in CT images, which can account for the tumor deformation caused by the inspiration level's difference. It is accomplished with a new nonrigid lung registration process, which can handle the tumor expanding/shrinking problem occurring in many conventional nonrigid registration methods. The accurate nonrigid registration is performed by weighting the matching cost of each voxel, based on the result of a new nodule detection approach and a powerful nodule segmentation algorithm. Comprehensive experiments show the high accuracy of our algorithms and the promising results of our new tumor growth measure.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132241863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
On conversion from color to gray-scale images for face detection 用于人脸检测的彩色图像到灰度图像的转换
Juwei Lu, K. Plataniotis
The paper presents a study on color to gray image conversion from a novel point of view: face detection. To the best knowledge of the authors, research in such a specific topic has not been conducted before. Our work reveals that the standard NTSC conversion is not optimal for face detection tasks, although it may be the best for use to display pictures on monochrome televisions. It is further found experimentally with two AdaBoost-based face detection systems that the detect rates may vary up to 10% by simply changing the parameters of the RGB to Gray conversion. On the other hand, the change has little influence on the false positive rates. Compared to the standard NTSC conversion, the detect rate with the best found parameter setting is 2.85% and 3.58% higher for the two evaluated face detection systems. Promisingly, the work suggests a new solution to the color to gray conversion. It could be extremely easy to be incorporated into most existing face detection systems for accuracy improvement without introduction of any extra cost in computational complexity.
本文从一个新的角度——人脸检测——研究了彩色图像到灰度图像的转换。据作者所知,在这样一个特定主题的研究之前还没有进行过。我们的研究表明,标准的NTSC转换并不是人脸检测任务的最佳选择,尽管它可能是单色电视上显示图片的最佳选择。通过两个基于adaboost的人脸检测系统的实验进一步发现,通过简单地改变RGB到Gray转换的参数,检测率可以变化高达10%。另一方面,这种变化对假阳性率的影响很小。与标准NTSC转换相比,两种被评估的人脸检测系统在最佳发现参数设置下的检测率分别高出2.85%和3.58%。有希望的是,这项工作提出了一种新的解决方案,以颜色到灰色的转换。它可以非常容易地整合到大多数现有的人脸检测系统中,以提高准确性,而不会引入任何额外的计算复杂性成本。
{"title":"On conversion from color to gray-scale images for face detection","authors":"Juwei Lu, K. Plataniotis","doi":"10.1109/CVPRW.2009.5204297","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204297","url":null,"abstract":"The paper presents a study on color to gray image conversion from a novel point of view: face detection. To the best knowledge of the authors, research in such a specific topic has not been conducted before. Our work reveals that the standard NTSC conversion is not optimal for face detection tasks, although it may be the best for use to display pictures on monochrome televisions. It is further found experimentally with two AdaBoost-based face detection systems that the detect rates may vary up to 10% by simply changing the parameters of the RGB to Gray conversion. On the other hand, the change has little influence on the false positive rates. Compared to the standard NTSC conversion, the detect rate with the best found parameter setting is 2.85% and 3.58% higher for the two evaluated face detection systems. Promisingly, the work suggests a new solution to the color to gray conversion. It could be extremely easy to be incorporated into most existing face detection systems for accuracy improvement without introduction of any extra cost in computational complexity.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132508637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A syntax for image understanding 用于图像理解的语法
N. Ahuja
We consider one of the most basic questions in computer vision, that of finding a low-level image representation that could be used to seed diverse, subsequent computations of image understanding. Can we define a relatively general purpose image representation which would serve as the syntax for diverse needs of image understanding? What makes good image syntax? How do we evaluate it? We pose a series of such questions and evolve a set of answers to them, which in turn help evolve an image representation. For concreteness, we first perform this exercise in the specific context of the following problem.
我们考虑了计算机视觉中最基本的问题之一,即找到一个低级别的图像表示,可以用来为图像理解的各种后续计算提供种子。我们能否定义一个相对通用的图像表示,作为图像理解不同需求的语法?什么是好的图像语法?我们如何评估它?我们提出了一系列这样的问题,并进化出一套答案,这反过来又有助于进化出一种图像表示。为了具体起见,我们首先在下面这个问题的具体背景下进行这个练习。
{"title":"A syntax for image understanding","authors":"N. Ahuja","doi":"10.1109/CVPRW.2009.5204337","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204337","url":null,"abstract":"We consider one of the most basic questions in computer vision, that of finding a low-level image representation that could be used to seed diverse, subsequent computations of image understanding. Can we define a relatively general purpose image representation which would serve as the syntax for diverse needs of image understanding? What makes good image syntax? How do we evaluate it? We pose a series of such questions and evolve a set of answers to them, which in turn help evolve an image representation. For concreteness, we first perform this exercise in the specific context of the following problem.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"148 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133419770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view reconstruction for projector camera systems based on bundle adjustment 基于束调整的投影摄像系统多视点重建
Furukawa Ryo, K. Inose, Hiroshi Kawasaki
Range scanners using projector-camera systems have been studied actively in recent years as methods for measuring 3D shapes accurately and cost-effectively. To acquire an entire 3D shape of an object with such systems, the shape of the object should be captured from multiple directions and the set of captured shapes should be aligned using algorithms such as ICPs. Then, the aligned shapes are integrated into a single 3D shape model. However, the captured shapes are often distorted due to errors of intrinsic or extrinsic parameters of the camera and the projector. Because of these distortions, gaps between overlapped surfaces remain even after aligning the 3D shapes. In this paper, we propose a new method to capture an entire shape with high precision using an active stereo range scanner which consists of a projector and a camera with fixed relative positions. In the proposed method, minimization of calibration errors of the projector-camera pair and registration errors between 3D shapes from different viewpoints are simultaneously achieved. The proposed method can be considered as a variation of bundle adjustment techniques adapted to projector-camera systems. Since acquisition of correspondences between different views is not easy for projector-camera systems, a solution for the problem is also presented.
近年来,使用投影相机系统的距离扫描仪作为一种精确、经济有效地测量三维形状的方法得到了积极的研究。为了用这样的系统获得一个物体的完整3D形状,物体的形状应该从多个方向捕获,并且应该使用诸如icp之类的算法对捕获的形状集进行对齐。然后,将对齐的形状集成到单个3D形状模型中。然而,由于相机和投影仪的内在或外在参数的误差,捕获的形状经常失真。由于这些扭曲,即使在对齐3D形状之后,重叠表面之间的间隙仍然存在。在本文中,我们提出了一种利用由投影仪和固定相对位置的相机组成的有源立体距离扫描仪来高精度捕获整个形状的新方法。该方法同时实现了投影-相机对标定误差和不同视点三维形状配准误差的最小化。所提出的方法可以看作是适用于投影-摄像机系统的束平差技术的一种变体。由于投影-摄像机系统不易获取不同视点之间的对应关系,提出了一种解决方法。
{"title":"Multi-view reconstruction for projector camera systems based on bundle adjustment","authors":"Furukawa Ryo, K. Inose, Hiroshi Kawasaki","doi":"10.1109/CVPRW.2009.5204318","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204318","url":null,"abstract":"Range scanners using projector-camera systems have been studied actively in recent years as methods for measuring 3D shapes accurately and cost-effectively. To acquire an entire 3D shape of an object with such systems, the shape of the object should be captured from multiple directions and the set of captured shapes should be aligned using algorithms such as ICPs. Then, the aligned shapes are integrated into a single 3D shape model. However, the captured shapes are often distorted due to errors of intrinsic or extrinsic parameters of the camera and the projector. Because of these distortions, gaps between overlapped surfaces remain even after aligning the 3D shapes. In this paper, we propose a new method to capture an entire shape with high precision using an active stereo range scanner which consists of a projector and a camera with fixed relative positions. In the proposed method, minimization of calibration errors of the projector-camera pair and registration errors between 3D shapes from different viewpoints are simultaneously achieved. The proposed method can be considered as a variation of bundle adjustment techniques adapted to projector-camera systems. Since acquisition of correspondences between different views is not easy for projector-camera systems, a solution for the problem is also presented.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1