首页 > 最新文献

Proceedings Ninth IEEE International Conference on Computer Vision最新文献

英文 中文
Fast vehicle detection with probabilistic feature grouping and its application to vehicle tracking 基于概率特征分组的快速车辆检测及其在车辆跟踪中的应用
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238392
Zuwhan Kim, Jitendra Malik
Generating vehicle trajectories from video data is an important application of ITS (intelligent transportation systems). We introduce a new tracking approach which uses model-based 3-D vehicle detection and description algorithm. Our vehicle detection and description algorithm is based on a probabilistic line feature grouping, and it is faster (by up to an order of magnitude) and more flexible than previous image-based algorithms. We present the system implementation and the vehicle detection and tracking results.
从视频数据中生成车辆轨迹是智能交通系统的一个重要应用。提出了一种基于模型的三维车辆检测与描述算法的跟踪方法。我们的车辆检测和描述算法基于概率线特征分组,它比以前基于图像的算法更快(高达一个数量级),也更灵活。给出了系统的实现和车辆检测与跟踪结果。
{"title":"Fast vehicle detection with probabilistic feature grouping and its application to vehicle tracking","authors":"Zuwhan Kim, Jitendra Malik","doi":"10.1109/ICCV.2003.1238392","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238392","url":null,"abstract":"Generating vehicle trajectories from video data is an important application of ITS (intelligent transportation systems). We introduce a new tracking approach which uses model-based 3-D vehicle detection and description algorithm. Our vehicle detection and description algorithm is based on a probabilistic line feature grouping, and it is faster (by up to an order of magnitude) and more flexible than previous image-based algorithms. We present the system implementation and the vehicle detection and tracking results.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 186
A novel approach for texture shape recovery 纹理形状恢复的一种新方法
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238650
Jing Wang, Kristin J. Dana
In vision and graphics, there is a sustained interest in capturing accurate 3D shape with various scanning devices. However, the resulting geometric representation is only part of the story. Surface texture of real objects is also an important component of the representation and fine-scale surface geometry such as surface markings, roughness, and imprints, are essential in highly realistic rendering and accurate prediction. We present a novel approach for measuring the fine-scale surface shape of specular surfaces using a curved mirror to view multiple angles in a single image. A distinguishing aspect of our method is that it is designed for specular surfaces, unlike many methods (e.g. laser scanning) which cannot handle highly specular objects. Also, the spatial resolution is very high so that it can resolve very small surface details that are beyond the resolution of standard devices. Furthermore, our approach incorporates the simultaneous use of a bidirectional texture measurement method, so that spatially varying bidirectional reflectance is measured at the same time as surface shape.
在视觉和图形领域,人们对使用各种扫描设备捕获精确的3D形状一直很感兴趣。然而,所得到的几何表示只是故事的一部分。真实物体的表面纹理也是表征的重要组成部分,而精细的表面几何形状,如表面标记、粗糙度和印记,对于高度逼真的渲染和准确的预测是必不可少的。我们提出了一种新的方法来测量镜面的细尺度表面形状,使用曲面镜在单个图像中查看多个角度。我们的方法的一个显著方面是,它是为高光表面设计的,不像许多方法(如激光扫描)不能处理高高光物体。此外,空间分辨率非常高,因此它可以解决超出标准设备分辨率的非常小的表面细节。此外,我们的方法结合了双向纹理测量方法的同时使用,以便在测量表面形状的同时测量空间变化的双向反射率。
{"title":"A novel approach for texture shape recovery","authors":"Jing Wang, Kristin J. Dana","doi":"10.1109/ICCV.2003.1238650","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238650","url":null,"abstract":"In vision and graphics, there is a sustained interest in capturing accurate 3D shape with various scanning devices. However, the resulting geometric representation is only part of the story. Surface texture of real objects is also an important component of the representation and fine-scale surface geometry such as surface markings, roughness, and imprints, are essential in highly realistic rendering and accurate prediction. We present a novel approach for measuring the fine-scale surface shape of specular surfaces using a curved mirror to view multiple angles in a single image. A distinguishing aspect of our method is that it is designed for specular surfaces, unlike many methods (e.g. laser scanning) which cannot handle highly specular objects. Also, the spatial resolution is very high so that it can resolve very small surface details that are beyond the resolution of standard devices. Furthermore, our approach incorporates the simultaneous use of a bidirectional texture measurement method, so that spatially varying bidirectional reflectance is measured at the same time as surface shape.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124261514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A multi-scale generative model for animate shapes and parts 动画形状和部件的多尺度生成模型
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238350
A. Dubinskiy, Song-Chun Zhu
We present a multiscale generative model for representing animate shapes and extracting meaningful parts of objects. The model assumes that animate shapes (2D simple dosed curves) are formed by a linear superposition of a number of shape bases. These shape bases resemble the multiscale Gabor bases in image pyramid representation, are well localized in both spatial and frequency domains, and form an over-complete dictionary. This model is simpler than the popular B-spline representation since it does not engage a domain partition. Thus it eliminates the interference between adjacent B-spline bases, and becomes a true linear additive model. We pursue the bases by reconstructing the shape in a coarse-to-fine procedure through curve evolution. These shape bases are further organized in a tree-structure, where the bases in each subtree sum up to an intuitive part of the object. To build probabilistic model for a class of objects, we propose a Markov random field model at each level of the tree representation to account for the spatial relationship between bases. Thus the final model integrates a Markov tree (generative) model over scales and a Markov random field over space. We adopt EM-type algorithm for learning the meaningful parts for a shape class, and show some results on shape synthesis.
我们提出了一个多尺度生成模型来表示动画形状和提取物体的有意义的部分。该模型假设动画形状(2D简单剂量曲线)是由许多形状基的线性叠加形成的。这些形状基类似于图像金字塔表示中的多尺度Gabor基,在空间域和频率域都有很好的定位,形成了一个过完备的字典。该模型比流行的b样条表示更简单,因为它不涉及域划分。从而消除了相邻b样条基间的干扰,成为一个真正的线性加性模型。我们通过曲线演化从粗到精的过程重构形状来追求基底。这些形状基础进一步组织成树状结构,其中每个子树中的基础总和为对象的直观部分。为了建立一类对象的概率模型,我们在树表示的每一层提出了一个马尔可夫随机场模型,以考虑基地之间的空间关系。因此,最终的模型集成了尺度上的马尔可夫树(生成)模型和空间上的马尔可夫随机场。采用em型算法对形状类进行有意义零件的学习,并在形状综合方面取得了一些成果。
{"title":"A multi-scale generative model for animate shapes and parts","authors":"A. Dubinskiy, Song-Chun Zhu","doi":"10.1109/ICCV.2003.1238350","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238350","url":null,"abstract":"We present a multiscale generative model for representing animate shapes and extracting meaningful parts of objects. The model assumes that animate shapes (2D simple dosed curves) are formed by a linear superposition of a number of shape bases. These shape bases resemble the multiscale Gabor bases in image pyramid representation, are well localized in both spatial and frequency domains, and form an over-complete dictionary. This model is simpler than the popular B-spline representation since it does not engage a domain partition. Thus it eliminates the interference between adjacent B-spline bases, and becomes a true linear additive model. We pursue the bases by reconstructing the shape in a coarse-to-fine procedure through curve evolution. These shape bases are further organized in a tree-structure, where the bases in each subtree sum up to an intuitive part of the object. To build probabilistic model for a class of objects, we propose a Markov random field model at each level of the tree representation to account for the spatial relationship between bases. Thus the final model integrates a Markov tree (generative) model over scales and a Markov random field over space. We adopt EM-type algorithm for learning the meaningful parts for a shape class, and show some results on shape synthesis.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Outlier correction in image sequences for the affine camera 仿射相机图像序列的离群值校正
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238400
D. Huynh, R. Hartley, A. Heyden
It is widely known that, for the affine camera model, both shape and motion can be factorized directly from the so-called image measurement matrix constructed from image point coordinates. The ability to extract both shape and motion from this matrix by a single SVD operation makes this shape-from-motion approach attractive; however, it can not deal with missing feature points and, in the presence of outliers, a direct SVD to the matrix would yield highly unreliable shape and motion components. Here, we present an outlier correction scheme that iteratively updates the elements of the image measurement matrix. The magnitude and sign of the update to each element is dependent upon the residual robustly estimated in each iteration. The result is that outliers are corrected and retained, giving improved reconstruction and smaller reprojection errors. Our iterative outlier correction scheme has been applied to both synthesized and real video sequences. The results obtained are remarkably good.
众所周知,对于仿射相机模型,形状和运动都可以直接从由图像点坐标构造的所谓图像测量矩阵中分解出来。通过单个SVD操作从该矩阵中提取形状和运动的能力使这种从运动中提取形状的方法具有吸引力;然而,它不能处理缺失的特征点,并且在存在异常值的情况下,对矩阵进行直接奇异值分解会产生高度不可靠的形状和运动分量。在这里,我们提出了一个离群值校正方案,迭代更新图像测量矩阵的元素。每个元素更新的幅度和符号取决于每次迭代中稳健估计的残差。结果是异常值被纠正和保留,从而改善了重建和更小的重投影误差。我们的迭代离群值校正方案已应用于合成和真实视频序列。所得结果非常好。
{"title":"Outlier correction in image sequences for the affine camera","authors":"D. Huynh, R. Hartley, A. Heyden","doi":"10.1109/ICCV.2003.1238400","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238400","url":null,"abstract":"It is widely known that, for the affine camera model, both shape and motion can be factorized directly from the so-called image measurement matrix constructed from image point coordinates. The ability to extract both shape and motion from this matrix by a single SVD operation makes this shape-from-motion approach attractive; however, it can not deal with missing feature points and, in the presence of outliers, a direct SVD to the matrix would yield highly unreliable shape and motion components. Here, we present an outlier correction scheme that iteratively updates the elements of the image measurement matrix. The magnitude and sign of the update to each element is dependent upon the residual robustly estimated in each iteration. The result is that outliers are corrected and retained, giving improved reconstruction and smaller reprojection errors. Our iterative outlier correction scheme has been applied to both synthesized and real video sequences. The results obtained are remarkably good.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124095401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Using temporal coherence to build models of animals 利用时间一致性建立动物模型
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238364
Deva Ramanan, D. Forsyth
We describe a system that can build appearance models of animals automatically from a video sequence of the relevant animal with no explicit supervisory information. The video sequence need not have any form of special background. Animals are modeled as a 2D kinematic chain of rectangular segments, where the number of segments and the topology of the chain are unknown. The system detects possible segments, clusters segments whose appearance is coherent over time, and then builds a spatial model of such segment clusters. The resulting representation of the spatial configuration of the animal in each frame can be seen either as a track - in which case the system described should be viewed as a generalized tracker, that is capable of modeling objects while tracking them - or as the source of an appearance model which can be used to build detectors for the particular animal. This is because knowing a video sequence is temporally coherent - i.e. that a particular animal is present through the sequence - is a strong supervisory signal. The method is shown to be successful as a tracker on video sequences of real scenes showing three different animals. For the same reason it is successful as a tracker, the method results in detectors that can be used to find each animal fairly reliably within the Corel collection of images.
我们描述了一个系统,该系统可以在没有明确监管信息的情况下,从相关动物的视频序列自动构建动物的外观模型。视频序列不需要有任何形式的特殊背景。动物被建模为矩形段的二维运动链,其中段的数量和链的拓扑结构是未知的。该系统检测可能的片段,集群片段,其外观随着时间的推移是一致的,然后建立这样的片段集群的空间模型。在每一帧中,动物的空间结构的最终表示可以被看作是一个轨迹——在这种情况下,所描述的系统应该被看作是一个通用的跟踪器,它能够在跟踪它们的同时建模对象——或者是一个外观模型的来源,它可以用来为特定的动物建立检测器。这是因为知道视频序列在时间上是连贯的——也就是说,一个特定的动物在序列中出现——是一个很强的监督信号。该方法被证明是成功的跟踪视频序列显示三种不同的动物的真实场景。由于同样的原因,它是一个成功的跟踪器,该方法产生检测器,可用于在Corel图像集合中相当可靠地找到每个动物。
{"title":"Using temporal coherence to build models of animals","authors":"Deva Ramanan, D. Forsyth","doi":"10.1109/ICCV.2003.1238364","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238364","url":null,"abstract":"We describe a system that can build appearance models of animals automatically from a video sequence of the relevant animal with no explicit supervisory information. The video sequence need not have any form of special background. Animals are modeled as a 2D kinematic chain of rectangular segments, where the number of segments and the topology of the chain are unknown. The system detects possible segments, clusters segments whose appearance is coherent over time, and then builds a spatial model of such segment clusters. The resulting representation of the spatial configuration of the animal in each frame can be seen either as a track - in which case the system described should be viewed as a generalized tracker, that is capable of modeling objects while tracking them - or as the source of an appearance model which can be used to build detectors for the particular animal. This is because knowing a video sequence is temporally coherent - i.e. that a particular animal is present through the sequence - is a strong supervisory signal. The method is shown to be successful as a tracker on video sequences of real scenes showing three different animals. For the same reason it is successful as a tracker, the method results in detectors that can be used to find each animal fairly reliably within the Corel collection of images.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Shape gradients for histogram segmentation using active contours 使用活动轮廓进行直方图分割的形状梯度
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238375
S. Jehan-Besson, M. Barlaud, G. Aubert, O. Faugeras
We consider the problem of image segmentation using active contours through the minimization of an energy criterion involving both region and boundary functionals. These functionals are derived through a shape derivative approach instead of classical calculus of variation. The equations can be elegantly derived without converting the region integrals into boundary integrals. From the derivative, we deduce the evolution equation of an active contour that makes it evolve towards a minimum of the criterion. We focus more particularly on statistical features globally attached to the region and especially to the probability density functions of image features such as the color histogram of a region. A theoretical framework is set for the minimization of the distance between two histograms for matching or tracking purposes. An application of this framework to the segmentation of color histograms in video sequences is then proposed. We briefly describe our numerical scheme and show some experimental results.
我们通过最小化涉及区域和边界函数的能量准则来考虑使用活动轮廓的图像分割问题。这些泛函是通过形状导数方法而不是经典的变分法推导出来的。不需要将区域积分转换为边界积分,就可以优雅地推导出方程。由导数推导出活动轮廓的演化方程,使活动轮廓向准则的最小值演化。我们特别关注区域的全局统计特征,特别是图像特征的概率密度函数,如区域的颜色直方图。为了匹配或跟踪的目的,设置了最小化两个直方图之间距离的理论框架。然后提出了该框架在视频序列中颜色直方图分割中的应用。我们简要地描述了我们的数值方案,并给出了一些实验结果。
{"title":"Shape gradients for histogram segmentation using active contours","authors":"S. Jehan-Besson, M. Barlaud, G. Aubert, O. Faugeras","doi":"10.1109/ICCV.2003.1238375","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238375","url":null,"abstract":"We consider the problem of image segmentation using active contours through the minimization of an energy criterion involving both region and boundary functionals. These functionals are derived through a shape derivative approach instead of classical calculus of variation. The equations can be elegantly derived without converting the region integrals into boundary integrals. From the derivative, we deduce the evolution equation of an active contour that makes it evolve towards a minimum of the criterion. We focus more particularly on statistical features globally attached to the region and especially to the probability density functions of image features such as the color histogram of a region. A theoretical framework is set for the minimization of the distance between two histograms for matching or tracking purposes. An application of this framework to the segmentation of color histograms in video sequences is then proposed. We briefly describe our numerical scheme and show some experimental results.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122624233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
Facial expression understanding in image sequences using dynamic and active visual information fusion 基于动态和主动视觉信息融合的图像序列面部表情理解
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238640
Yongmian Zhang, Q. Ji
This paper explores the use of multisensory information fusion technique with dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our approach to the facial expression understanding lies in a probabilistic framework by integrating the DBNs with the facial action units (AUs) from psychological view. The DBNs provide a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, and to actively select the most informative visual cues from the available information to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through modeling the temporal behavior of facial expressions. Experimental results demonstrate that our approach is more admissible for facial expression analysis in image sequences.
本文探讨了使用多感官信息融合技术与动态贝叶斯网络(dbn)来建模和理解图像序列中面部表情的时间行为。我们的面部表情理解方法是在一个概率框架中,从心理学的角度将dbn与面部动作单元(AUs)相结合。dbn提供了一个连贯和统一的层次概率框架来表示与面部表情相关的时空信息,并从可用信息中主动选择最具信息量的视觉线索,以最大限度地减少识别中的模糊性。面部表情的识别不仅要融合当前的视觉观察,还要融合之前的视觉证据。因此,通过建模面部表情的时间行为,识别变得更加鲁棒和准确。实验结果表明,该方法更适合于图像序列中的面部表情分析。
{"title":"Facial expression understanding in image sequences using dynamic and active visual information fusion","authors":"Yongmian Zhang, Q. Ji","doi":"10.1109/ICCV.2003.1238640","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238640","url":null,"abstract":"This paper explores the use of multisensory information fusion technique with dynamic Bayesian networks (DBNs) for modeling and understanding the temporal behaviors of facial expressions in image sequences. Our approach to the facial expression understanding lies in a probabilistic framework by integrating the DBNs with the facial action units (AUs) from psychological view. The DBNs provide a coherent and unified hierarchical probabilistic framework to represent spatial and temporal information related to facial expressions, and to actively select the most informative visual cues from the available information to minimize the ambiguity in recognition. The recognition of facial expressions is accomplished by fusing not only from the current visual observations, but also from the previous visual evidences. Consequently, the recognition becomes more robust and accurate through modeling the temporal behavior of facial expressions. Experimental results demonstrate that our approach is more admissible for facial expression analysis in image sequences.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123266578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Automatic video summarization by graph modeling 基于图形建模的自动视频摘要
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238320
C. Ngo, Yu-Fei Ma, HongJiang Zhang
We propose a unified approach for summarization based on the analysis of video structures and video highlights. Our approach emphasizes both the content balance and perceptual quality of a summary. Normalized cut algorithm is employed to globally and optimally partition a video into clusters. A motion attention model based on human perception is employed to compute the perceptual quality of shots and clusters. The clusters, together with the computed attention values, form a temporal graph similar to Markov chain that inherently describes the evolution and perceptual importance of video clusters. In our application, the flow of a temporal graph is utilized to group similar clusters into scenes, while the attention values are used as guidelines to select appropriate subshots in scenes for summarization.
在分析视频结构和视频亮点的基础上,提出了一种统一的视频摘要方法。我们的方法强调内容平衡和摘要的感知质量。采用归一化切割算法对视频进行全局最优的聚类划分。采用基于人类感知的运动注意模型来计算镜头和簇的感知质量。这些聚类与计算出的关注值一起,形成了一个类似于马尔可夫链的时间图,它本质上描述了视频聚类的演变和感知重要性。在我们的应用程序中,时序图的流被用来将相似的集群分组到场景中,而注意力值被用作在场景中选择适当的子镜头进行总结的指南。
{"title":"Automatic video summarization by graph modeling","authors":"C. Ngo, Yu-Fei Ma, HongJiang Zhang","doi":"10.1109/ICCV.2003.1238320","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238320","url":null,"abstract":"We propose a unified approach for summarization based on the analysis of video structures and video highlights. Our approach emphasizes both the content balance and perceptual quality of a summary. Normalized cut algorithm is employed to globally and optimally partition a video into clusters. A motion attention model based on human perception is employed to compute the perceptual quality of shots and clusters. The clusters, together with the computed attention values, form a temporal graph similar to Markov chain that inherently describes the evolution and perceptual importance of video clusters. In our application, the flow of a temporal graph is utilized to group similar clusters into scenes, while the attention values are used as guidelines to select appropriate subshots in scenes for summarization.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128718155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Fragmentation in the vision of scenes 场景视觉的碎片化
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238326
J. Geusebroek, A. Smeulders
Natural images are highly structured in their spatial configuration. Where one would expect a different spatial distribution for every image, as each image has a different spatial layout, we show that the spatial statistics of recorded images can be explained by a single process of sequential fragmentation. The observation by a resolution limited sensory system turns out to have a profound influence on the observed statistics of natural images. The power-law and normal distribution represent the extreme cases of sequential fragmentation. Between these two extremes, spatial detail statistics deform from power-law to normal through the Weibull type distribution as receptive field size increases relative to image detail size.
自然图像的空间结构是高度结构化的。由于每张图像具有不同的空间布局,因此人们期望每张图像具有不同的空间分布,但我们表明,记录图像的空间统计可以通过顺序碎片化的单一过程来解释。分辨率有限的感觉系统的观测结果对自然图像的观测统计有深远的影响。幂律分布和正态分布代表了序列分裂的极端情况。在这两个极端之间,随着感受野大小相对于图像细节大小的增加,空间细节统计量通过威布尔型分布从幂律变为正态。
{"title":"Fragmentation in the vision of scenes","authors":"J. Geusebroek, A. Smeulders","doi":"10.1109/ICCV.2003.1238326","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238326","url":null,"abstract":"Natural images are highly structured in their spatial configuration. Where one would expect a different spatial distribution for every image, as each image has a different spatial layout, we show that the spatial statistics of recorded images can be explained by a single process of sequential fragmentation. The observation by a resolution limited sensory system turns out to have a profound influence on the observed statistics of natural images. The power-law and normal distribution represent the extreme cases of sequential fragmentation. Between these two extremes, spatial detail statistics deform from power-law to normal through the Weibull type distribution as receptive field size increases relative to image detail size.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124594631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Graph partition by Swendsen-Wang cuts 用Swendsen-Wang分割图
Pub Date : 2003-10-13 DOI: 10.1109/ICCV.2003.1238362
Adrian Barbu, Song-Chun Zhu
Vision tasks, such as segmentation, grouping, recognition, can be formulated as graph partition problems. The recent literature witnessed two popular graph cut algorithms: the Ncut using spectral graph analysis and the minimum-cut using the maximum flow algorithm. We present a third major approach by generalizing the Swendsen-Wang method - a well celebrated algorithm in statistical mechanics. Our algorithm simulates ergodic, reversible Markov chain jumps in the space of graph partitions to sample a posterior probability. At each step, the algorithm splits, merges, or regroups a sizable subgraph, and achieves fast mixing at low temperature enabling a fast annealing procedure. Experiments show it converges in 2-30 seconds on a PC for image segmentation. This is 400 times faster than the single-site update Gibbs sampler, and 20-40 times faster than the DDMCMC algorithm. The algorithm can optimize over the number of models and works for general forms of posterior probabilities, so it is more general than the existing graph cut approaches.
视觉任务,如分割、分组、识别,可以表述为图划分问题。最近的文献见证了两种流行的图割算法:使用谱图分析的Ncut算法和使用最大流量算法的最小割算法。我们通过推广Swendsen-Wang方法(统计力学中一个著名的算法)提出了第三种主要方法。我们的算法模拟遍历的,可逆的马尔可夫链跳跃在空间的图分区采样后验概率。在每一步,算法拆分,合并,或重组一个相当大的子图,并实现快速混合在低温下实现快速退火过程。实验表明,该算法在PC机上的图像分割时间为2 ~ 30秒。这比单站点更新Gibbs采样器快400倍,比DDMCMC算法快20-40倍。该算法可以对模型的数量进行优化,并适用于一般形式的后验概率,因此它比现有的图割方法更具通用性。
{"title":"Graph partition by Swendsen-Wang cuts","authors":"Adrian Barbu, Song-Chun Zhu","doi":"10.1109/ICCV.2003.1238362","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238362","url":null,"abstract":"Vision tasks, such as segmentation, grouping, recognition, can be formulated as graph partition problems. The recent literature witnessed two popular graph cut algorithms: the Ncut using spectral graph analysis and the minimum-cut using the maximum flow algorithm. We present a third major approach by generalizing the Swendsen-Wang method - a well celebrated algorithm in statistical mechanics. Our algorithm simulates ergodic, reversible Markov chain jumps in the space of graph partitions to sample a posterior probability. At each step, the algorithm splits, merges, or regroups a sizable subgraph, and achieves fast mixing at low temperature enabling a fast annealing procedure. Experiments show it converges in 2-30 seconds on a PC for image segmentation. This is 400 times faster than the single-site update Gibbs sampler, and 20-40 times faster than the DDMCMC algorithm. The algorithm can optimize over the number of models and works for general forms of posterior probabilities, so it is more general than the existing graph cut approaches.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"178 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
期刊
Proceedings Ninth IEEE International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1