首页 > 最新文献

2013 IEEE International Conference on Computer Vision最新文献

英文 中文
Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors 基于深度传感器的室内场景三维布局及其杂波估计
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.161
Jian Zhang, Chen Kan, A. Schwing, R. Urtasun
In this paper we propose an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
在本文中,我们提出了一种利用RGB-D数据联合估计房间布局以及场景中存在的杂波的方法。为了实现这一目标,我们提出了一个有效的模型,能够利用深度和外观特征,这是互补的。此外,我们的方法是有效的,因为我们利用了加性势的固有分解。我们在具有挑战性的NYU v2数据集上证明了我们的方法的有效性,并表明使用深度将布局误差降低了6%,杂波估计降低了13%。
{"title":"Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors","authors":"Jian Zhang, Chen Kan, A. Schwing, R. Urtasun","doi":"10.1109/ICCV.2013.161","DOIUrl":"https://doi.org/10.1109/ICCV.2013.161","url":null,"abstract":"In this paper we propose an approach to jointly estimate the layout of rooms as well as the clutter present in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87839448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Multiview Photometric Stereo Using Planar Mesh Parameterization 基于平面网格参数化的多视点测光立体
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.148
Jaesik Park, Sudipta N. Sinha, Y. Matsushita, Yu-Wing Tai, In-So Kweon
We propose a method for accurate 3D shape reconstruction using uncalibrated multiview photometric stereo. A coarse mesh reconstructed using multiview stereo is first parameterized using a planar mesh parameterization technique. Subsequently, multiview photometric stereo is performed in the 2D parameter domain of the mesh, where all geometric and photometric cues from multiple images can be treated uniformly. Unlike traditional methods, there is no need for merging view-dependent surface normal maps. Our key contribution is a new photometric stereo based mesh refinement technique that can efficiently reconstruct meshes with extremely fine geometric details by directly estimating a displacement texture map in the 2D parameter domain. We demonstrate that intricate surface geometry can be reconstructed using several challenging datasets containing surfaces with specular reflections, multiple albedos and complex topologies.
我们提出了一种使用未校准的多视点光度立体图像进行精确三维形状重建的方法。首先利用平面网格参数化技术对多视立体重建的粗网格进行参数化。随后,在网格的二维参数域中执行多视图光度立体,其中来自多幅图像的所有几何和光度线索可以统一处理。与传统方法不同,该方法不需要合并依赖于视图的表面法线贴图。我们的主要贡献是一种新的基于光度立体的网格细化技术,该技术可以通过直接估计二维参数域中的位移纹理映射,有效地重建具有极精细几何细节的网格。我们证明了复杂的表面几何结构可以使用几个具有挑战性的数据集来重建,这些数据集包含具有镜面反射、多重反照率和复杂拓扑结构的表面。
{"title":"Multiview Photometric Stereo Using Planar Mesh Parameterization","authors":"Jaesik Park, Sudipta N. Sinha, Y. Matsushita, Yu-Wing Tai, In-So Kweon","doi":"10.1109/ICCV.2013.148","DOIUrl":"https://doi.org/10.1109/ICCV.2013.148","url":null,"abstract":"We propose a method for accurate 3D shape reconstruction using uncalibrated multiview photometric stereo. A coarse mesh reconstructed using multiview stereo is first parameterized using a planar mesh parameterization technique. Subsequently, multiview photometric stereo is performed in the 2D parameter domain of the mesh, where all geometric and photometric cues from multiple images can be treated uniformly. Unlike traditional methods, there is no need for merging view-dependent surface normal maps. Our key contribution is a new photometric stereo based mesh refinement technique that can efficiently reconstruct meshes with extremely fine geometric details by directly estimating a displacement texture map in the 2D parameter domain. We demonstrate that intricate surface geometry can be reconstructed using several challenging datasets containing surfaces with specular reflections, multiple albedos and complex topologies.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87972034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Illuminant Chromaticity from Image Sequences 来自图像序列的光源色度
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.412
V. Prinet, Dani Lischinski, M. Werman
We estimate illuminant chromaticity from temporal sequences, for scenes illuminated by either one or two dominant illuminants. While there are many methods for illuminant estimation from a single image, few works so far have focused on videos, and even fewer on multiple light sources. Our aim is to leverage information provided by the temporal acquisition, where either the objects or the camera or the light source are/is in motion in order to estimate illuminant color without the need for user interaction or using strong assumptions and heuristics. We introduce a simple physically-based formulation based on the assumption that the incident light chromaticity is constant over a short space-time domain. We show that a deterministic approach is not sufficient for accurate and robust estimation: however, a probabilistic formulation makes it possible to implicitly integrate away hidden factors that have been ignored by the physical model. Experimental results are reported on a dataset of natural video sequences and on the Gray Ball benchmark, indicating that we compare favorably with the state-of-the-art.
我们从时间序列中估计光源色度,对于由一个或两个主要光源照亮的场景。虽然有许多方法可以从单个图像中估计光源,但迄今为止很少有作品专注于视频,更不用说多光源了。我们的目标是利用时间采集提供的信息,其中物体或相机或光源处于运动状态,以便在不需要用户交互或使用强假设和启发式的情况下估计光源颜色。我们引入了一个简单的基于物理的公式,该公式基于入射光色度在短时空域中是恒定的假设。我们表明,确定性方法不足以进行准确和稳健的估计:然而,概率公式可以隐式地整合掉被物理模型忽略的隐藏因素。在自然视频序列数据集和灰球基准上报告了实验结果,表明我们与最先进的技术进行了比较。
{"title":"Illuminant Chromaticity from Image Sequences","authors":"V. Prinet, Dani Lischinski, M. Werman","doi":"10.1109/ICCV.2013.412","DOIUrl":"https://doi.org/10.1109/ICCV.2013.412","url":null,"abstract":"We estimate illuminant chromaticity from temporal sequences, for scenes illuminated by either one or two dominant illuminants. While there are many methods for illuminant estimation from a single image, few works so far have focused on videos, and even fewer on multiple light sources. Our aim is to leverage information provided by the temporal acquisition, where either the objects or the camera or the light source are/is in motion in order to estimate illuminant color without the need for user interaction or using strong assumptions and heuristics. We introduce a simple physically-based formulation based on the assumption that the incident light chromaticity is constant over a short space-time domain. We show that a deterministic approach is not sufficient for accurate and robust estimation: however, a probabilistic formulation makes it possible to implicitly integrate away hidden factors that have been ignored by the physical model. Experimental results are reported on a dataset of natural video sequences and on the Gray Ball benchmark, indicating that we compare favorably with the state-of-the-art.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88008600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Refractive Structure-from-Motion on Underwater Images 水下图像的运动折射结构
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.14
Anne Jordt, R. Koch
In underwater environments, cameras need to be confined in an underwater housing, viewing the scene through a piece of glass. In case of flat port underwater housings, light rays entering the camera housing are refracted twice, due to different medium densities of water, glass, and air. This causes the usually linear rays of light to bend and the commonly used pinhole camera model to be invalid. When using the pinhole camera model without explicitly modeling refraction in Structure-from-Motion (SfM) methods, a systematic model error occurs. Therefore, in this paper, we propose a system for computing camera path and 3D points with explicit incorporation of refraction using new methods for pose estimation. Additionally, a new error function is introduced for non-linear optimization, especially bundle adjustment. The proposed method allows to increase reconstruction accuracy and is evaluated in a set of experiments, where the proposed method's performance is compared to SfM with the perspective camera model.
在水下环境中,相机需要被限制在水下的外壳中,通过一块玻璃观察场景。在平口水下外壳的情况下,由于水、玻璃和空气的不同介质密度,进入相机外壳的光线被折射两次。这导致通常的线性光线弯曲,常用的针孔相机模型无效。在运动成结构(SfM)方法中,当使用针孔相机模型而没有明确地对折射进行建模时,会产生系统的模型误差。因此,在本文中,我们提出了一种计算相机路径和3D点的系统,并使用新的姿态估计方法显式地结合折射。此外,还引入了新的误差函数用于非线性优化,特别是束平差。所提出的方法可以提高重建精度,并在一组实验中进行了评估,其中所提出的方法的性能与带有透视相机模型的SfM进行了比较。
{"title":"Refractive Structure-from-Motion on Underwater Images","authors":"Anne Jordt, R. Koch","doi":"10.1109/ICCV.2013.14","DOIUrl":"https://doi.org/10.1109/ICCV.2013.14","url":null,"abstract":"In underwater environments, cameras need to be confined in an underwater housing, viewing the scene through a piece of glass. In case of flat port underwater housings, light rays entering the camera housing are refracted twice, due to different medium densities of water, glass, and air. This causes the usually linear rays of light to bend and the commonly used pinhole camera model to be invalid. When using the pinhole camera model without explicitly modeling refraction in Structure-from-Motion (SfM) methods, a systematic model error occurs. Therefore, in this paper, we propose a system for computing camera path and 3D points with explicit incorporation of refraction using new methods for pose estimation. Additionally, a new error function is introduced for non-linear optimization, especially bundle adjustment. The proposed method allows to increase reconstruction accuracy and is evaluated in a set of experiments, where the proposed method's performance is compared to SfM with the perspective camera model.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85810510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Measuring Flow Complexity in Videos 测量视频流的复杂性
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.140
Saad Ali
In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a `braid' based representation. The mapping is based on the observation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among particles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories becomes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathematical tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and object densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.
本文引入了流复杂性的概念来衡量对象之间的交互量,并提出了一种直接从视频序列中计算流复杂性的方法。该方法使用粒子轨迹作为运动的输入表示,并将其映射为基于“辫子”的表示。这种映射是基于这样一种观察,即粒子的二维轨迹在时空中由于粒子之间随时间的混合而呈现出辫子的形式。由于这种映射,从粒子轨迹估计流动复杂性的问题变成了估计编织复杂性的问题,而编织复杂性又可以通过测量编织的拓扑熵来计算。为此,采用了近年来从编织理论中发展起来的数学工具,可以快速计算编织的拓扑熵。该方法在一个由开源视频组成的数据集上进行评估,这些视频描述了移动物体类型、场景布局、相机视角、运动模式和物体密度方面的变化。结果表明,所提出的方法能够量化流的复杂性,同时提供有关复杂性来源的有用见解。
{"title":"Measuring Flow Complexity in Videos","authors":"Saad Ali","doi":"10.1109/ICCV.2013.140","DOIUrl":"https://doi.org/10.1109/ICCV.2013.140","url":null,"abstract":"In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a `braid' based representation. The mapping is based on the observation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among particles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories becomes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathematical tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and object densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86866640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Semantically-Based Human Scanpath Estimation with HMMs 基于语义的hmm人体扫描路径估计
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.401
Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin
We present a method for estimating human scan paths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scan paths are modeled based on three principal factors that influence human attention, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.
我们提出了一种估计人类扫描路径的方法,这是跟随视觉注意力在图像上的凝视转移序列。在这项工作中,扫描路径基于影响人类注意力的三个主要因素建模,即低水平特征显著性、空间位置和语义内容。低水平特征显著性被表示为基于特征差异的不同图像区域之间的过渡概率。空间位置对注视位移的影响模型为Levy飞行,注视位移服从二维柯西分布。为了考虑语义内容,我们建议使用隐马尔可夫模型(HMM)和图像区域的视觉词袋描述符。HMM非常适合这一目的,因为1)通过无监督学习获得的隐藏状态可以表示潜在的语义概念,2)隐藏状态的先验分布描述了对语义概念的视觉吸引力,3)转移概率表示人类的目光转移模式。将该方法应用于任务驱动的查看过程。通过人眼注视数据的实验和分析,验证了该方法的有效性。
{"title":"Semantically-Based Human Scanpath Estimation with HMMs","authors":"Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin","doi":"10.1109/ICCV.2013.401","DOIUrl":"https://doi.org/10.1109/ICCV.2013.401","url":null,"abstract":"We present a method for estimating human scan paths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scan paths are modeled based on three principal factors that influence human attention, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86296753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Space-Time Tradeoffs in Photo Sequencing 照片排序中的时空权衡
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.125
Tali Basha, Y. Moses, S. Avidan
Photo-sequencing is the problem of recovering the temporal order of a set of still images of a dynamic event, taken asynchronously by a set of uncalibrated cameras. Solving this problem is a first, crucial step for analyzing (or visualizing) the dynamic content of the scene captured by a large number of freely moving spectators. We propose a geometric based solution, followed by rank aggregation to the photo-sequencing problem. Our algorithm trades spatial certainty for temporal certainty. Whereas the previous solution proposed by [4] relies on two images taken from the same static camera to eliminate uncertainty in space, we drop the static-camera assumption and replace it with temporal information available from images taken from the same (moving) camera. Our method thus overcomes the limitation of the static-camera assumption, and scales much better with the duration of the event and the spread of cameras in space. We present successful results on challenging real data sets and large scale synthetic data (250 images).
照片排序是恢复动态事件的一组静止图像的时间顺序的问题,这些图像是由一组未校准的相机异步拍摄的。解决这个问题是分析(或可视化)大量自由移动的观众捕捉到的场景动态内容的第一步,也是至关重要的一步。我们提出了一个基于几何的解决方案,其次是秩聚合的照片排序问题。我们的算法将空间确定性换成了时间确定性。先前[4]提出的解决方案依赖于从同一静态相机拍摄的两幅图像来消除空间中的不确定性,而我们放弃了静态相机的假设,并将其替换为从同一(移动)相机拍摄的图像中获得的时间信息。因此,我们的方法克服了静态摄像机假设的局限性,并且随着事件的持续时间和摄像机在空间中的分布而更好地缩放。我们在具有挑战性的真实数据集和大规模合成数据(250张图像)上展示了成功的结果。
{"title":"Space-Time Tradeoffs in Photo Sequencing","authors":"Tali Basha, Y. Moses, S. Avidan","doi":"10.1109/ICCV.2013.125","DOIUrl":"https://doi.org/10.1109/ICCV.2013.125","url":null,"abstract":"Photo-sequencing is the problem of recovering the temporal order of a set of still images of a dynamic event, taken asynchronously by a set of uncalibrated cameras. Solving this problem is a first, crucial step for analyzing (or visualizing) the dynamic content of the scene captured by a large number of freely moving spectators. We propose a geometric based solution, followed by rank aggregation to the photo-sequencing problem. Our algorithm trades spatial certainty for temporal certainty. Whereas the previous solution proposed by [4] relies on two images taken from the same static camera to eliminate uncertainty in space, we drop the static-camera assumption and replace it with temporal information available from images taken from the same (moving) camera. Our method thus overcomes the limitation of the static-camera assumption, and scales much better with the duration of the event and the spread of cameras in space. We present successful results on challenging real data sets and large scale synthetic data (250 images).","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88290096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation 局部仿射稀疏到密集匹配的运动和遮挡估计
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.216
Marius Leordeanu, Andrei Zanfir, C. Sminchisescu
Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraints permit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
估计大位移下连续视频帧之间的密集对应域在许多视觉学习和识别任务中是重要的。我们提出了一种新的稀疏到密集的运动场估计和遮挡检测匹配方法。作为当前光流文献中从粗到精方法的替代方案,我们从更高层次的稀疏匹配开始,使用遮挡感知的局部仿射模型,在扩展邻域上收集丰富的外观和几何约束。然后,我们转向更简单,但更密集的经典流场模型,通过插值过程提供稀疏和密集对应场之间的自然过渡。我们通过实验证明,即使在大位移和显著外观变化的困难情况下,我们的外观特征和复杂的几何约束也允许正确的运动估计。我们还提出了一种新的闭塞检测分类方法,该方法与稀疏到密集的匹配模型相结合。我们在新发布的Sintel数据集上验证了我们的方法,并获得了最先进的结果。
{"title":"Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation","authors":"Marius Leordeanu, Andrei Zanfir, C. Sminchisescu","doi":"10.1109/ICCV.2013.216","DOIUrl":"https://doi.org/10.1109/ICCV.2013.216","url":null,"abstract":"Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraints permit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80058196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction Lytro相机高质量光场图像重建标定管道建模
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.407
Donghyeon Cho, Minhaeng Lee, Sunyeong Kim, Yu-Wing Tai
Light-field imaging systems have got much attention recently as the next generation camera model. A light-field imaging system consists of three parts: data acquisition, manipulation, and application. Given an acquisition system, it is important to understand how a light-field camera converts from its raw image to its resulting refocused image. In this paper, using the Lytro camera as an example, we describe step-by-step procedures to calibrate a raw light-field image. In particular, we are interested in knowing the spatial and angular coordinates of the micro lens array and the resampling process for image reconstruction. Since Lytro uses a hexagonal arrangement of a micro lens image, additional treatments in calibration are required. After calibration, we analyze and compare the performances of several resampling methods for image reconstruction with and without calibration. Finally, a learning based interpolation method is proposed which demonstrates a higher quality image reconstruction than previous interpolation methods including a method used in Lytro software.
光场成像系统作为新一代相机的发展模式,近年来备受关注。光场成像系统由三部分组成:数据采集、操作和应用。给定一个采集系统,了解光场相机如何从原始图像转换到最终的重新聚焦图像是很重要的。在本文中,以Lytro相机为例,我们描述了一步一步校准原始光场图像的过程。我们特别感兴趣的是了解微透镜阵列的空间和角坐标以及图像重建的重采样过程。由于Lytro使用微透镜图像的六角形排列,因此需要额外的校准处理。校正后,我们分析和比较了几种重采样方法在有校正和无校正图像重建中的性能。最后,提出了一种基于学习的插值方法,该方法比以前的插值方法(包括Lytro软件中使用的方法)具有更高的图像重建质量。
{"title":"Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction","authors":"Donghyeon Cho, Minhaeng Lee, Sunyeong Kim, Yu-Wing Tai","doi":"10.1109/ICCV.2013.407","DOIUrl":"https://doi.org/10.1109/ICCV.2013.407","url":null,"abstract":"Light-field imaging systems have got much attention recently as the next generation camera model. A light-field imaging system consists of three parts: data acquisition, manipulation, and application. Given an acquisition system, it is important to understand how a light-field camera converts from its raw image to its resulting refocused image. In this paper, using the Lytro camera as an example, we describe step-by-step procedures to calibrate a raw light-field image. In particular, we are interested in knowing the spatial and angular coordinates of the micro lens array and the resampling process for image reconstruction. Since Lytro uses a hexagonal arrangement of a micro lens image, additional treatments in calibration are required. After calibration, we analyze and compare the performances of several resampling methods for image reconstruction with and without calibration. Finally, a learning based interpolation method is proposed which demonstrates a higher quality image reconstruction than previous interpolation methods including a method used in Lytro software.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75867683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
Finding Causal Interactions in Video Sequences 寻找视频序列中的因果关系
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.444
Mustafa Ayazoglu, Burak Yılmaz, M. Sznaier, O. Camps
This paper considers the problem of detecting causal interactions in video clips. Specifically, the goal is to detect whether the actions of a given target can be explained in terms of the past actions of a collection of other agents. We propose to solve this problem by recasting it into a directed graph topology identification, where each node corresponds to the observed motion of a given target, and each link indicates the presence of a causal correlation. As shown in the paper, this leads to a block-sparsification problem that can be efficiently solved using a modified Group-Lasso type approach, capable of handling missing data and outliers (due for instance to occlusion and mis-identified correspondences). Moreover, this approach also identifies time instants where the interactions between agents change, thus providing event detection capabilities. These results are illustrated with several examples involving non-trivial interactions amongst several human subjects.
本文研究了视频片段中因果交互的检测问题。具体来说,目标是检测给定目标的动作是否可以用其他代理集合的过去动作来解释。我们建议通过将其重新转换为有向图拓扑识别来解决这个问题,其中每个节点对应于给定目标的观察运动,每个链接表明存在因果关系。如本文所示,这会导致块稀疏化问题,可以使用改进的Group-Lasso类型方法有效地解决,该方法能够处理丢失的数据和异常值(例如由于遮挡和错误识别的对应)。此外,这种方法还可以识别代理之间的交互发生变化的时间瞬间,从而提供事件检测功能。这些结果用几个例子来说明,这些例子涉及几个人类受试者之间的非平凡的相互作用。
{"title":"Finding Causal Interactions in Video Sequences","authors":"Mustafa Ayazoglu, Burak Yılmaz, M. Sznaier, O. Camps","doi":"10.1109/ICCV.2013.444","DOIUrl":"https://doi.org/10.1109/ICCV.2013.444","url":null,"abstract":"This paper considers the problem of detecting causal interactions in video clips. Specifically, the goal is to detect whether the actions of a given target can be explained in terms of the past actions of a collection of other agents. We propose to solve this problem by recasting it into a directed graph topology identification, where each node corresponds to the observed motion of a given target, and each link indicates the presence of a causal correlation. As shown in the paper, this leads to a block-sparsification problem that can be efficiently solved using a modified Group-Lasso type approach, capable of handling missing data and outliers (due for instance to occlusion and mis-identified correspondences). Moreover, this approach also identifies time instants where the interactions between agents change, thus providing event detection capabilities. These results are illustrated with several examples involving non-trivial interactions amongst several human subjects.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79563712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
2013 IEEE International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1