首页 > 最新文献

2013 IEEE International Conference on Computer Vision最新文献

英文 中文
A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects 一种统一的概率方法来建模属性和对象之间的关系
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.264
Xiaoyang Wang, Q. Ji
This paper proposes a unified probabilistic model to model the relationships between attributes and objects for attribute prediction and object recognition. As a list of semantically meaningful properties of objects, attributes generally relate to each other statistically. In this paper, we propose a unified probabilistic model to automatically discover and capture both the object-dependent and object-independent attribute relationships. The model utilizes the captured relationships to benefit both attribute prediction and object recognition. Experiments on four benchmark attribute datasets demonstrate the effectiveness of the proposed unified model for improving attribute prediction as well as object recognition in both standard and zero-shot learning cases.
本文提出了一种统一的概率模型,对属性和目标之间的关系进行建模,用于属性预测和目标识别。作为对象的语义上有意义的属性列表,属性通常在统计上相互关联。在本文中,我们提出了一个统一的概率模型来自动发现和捕获对象依赖和对象独立的属性关系。该模型利用捕获的关系进行属性预测和目标识别。在四个基准属性数据集上的实验证明了该统一模型在标准和零射击学习情况下提高属性预测和目标识别的有效性。
{"title":"A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects","authors":"Xiaoyang Wang, Q. Ji","doi":"10.1109/ICCV.2013.264","DOIUrl":"https://doi.org/10.1109/ICCV.2013.264","url":null,"abstract":"This paper proposes a unified probabilistic model to model the relationships between attributes and objects for attribute prediction and object recognition. As a list of semantically meaningful properties of objects, attributes generally relate to each other statistically. In this paper, we propose a unified probabilistic model to automatically discover and capture both the object-dependent and object-independent attribute relationships. The model utilizes the captured relationships to benefit both attribute prediction and object recognition. Experiments on four benchmark attribute datasets demonstrate the effectiveness of the proposed unified model for improving attribute prediction as well as object recognition in both standard and zero-shot learning cases.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"16 1","pages":"2120-2127"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89302564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification 异构自相似特征(HASC):利用关系信息进行分类
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.105
Marco San-Biagio, M. Crocco, M. Cristani, Samuele Martelli, Vittorio Murino
Capturing the essential characteristics of visual objects by considering how their features are inter-related is a recent philosophy of object classification. In this paper, we embed this principle in a novel image descriptor, dubbed Heterogeneous Auto-Similarities of Characteristics (HASC). HASC is applied to heterogeneous dense features maps, encoding linear relations by co variances and nonlinear associations through information-theoretic measures such as mutual information and entropy. In this way, highly complex structural information can be expressed in a compact, scale invariant and robust manner. The effectiveness of HASC is tested on many diverse detection and classification scenarios, considering objects, textures and pedestrians, on widely known benchmarks (Caltech-101, Brodatz, Daimler Multi-Cue). In all the cases, the results obtained with standard classifiers demonstrate the superiority of HASC with respect to the most adopted local feature descriptors nowadays, such as SIFT, HOG, LBP and feature co variances. In addition, HASC sets the state-of-the-art on the Brodatz texture dataset and the Daimler Multi-Cue pedestrian dataset, without exploiting ad-hoc sophisticated classifiers.
通过考虑视觉对象的特征是如何相互关联来捕捉其本质特征是最近的一种对象分类哲学。在本文中,我们将这一原理嵌入到一种新的图像描述符中,称为异构自相似特征(HASC)。HASC应用于异构密集特征映射,通过协方差编码线性关系,通过互信息和熵等信息理论度量编码非线性关联。通过这种方式,高度复杂的结构信息可以以紧凑、尺度不变和鲁棒的方式表达。HASC的有效性在许多不同的检测和分类场景中进行了测试,考虑到物体,纹理和行人,以及众所周知的基准(Caltech-101, Brodatz, Daimler Multi-Cue)。在所有情况下,使用标准分类器获得的结果都证明了HASC相对于目前最常用的局部特征描述符(如SIFT、HOG、LBP和特征共方差)的优越性。此外,HASC在Brodatz纹理数据集和Daimler Multi-Cue行人数据集上设置了最先进的技术,而无需利用特别复杂的分类器。
{"title":"Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification","authors":"Marco San-Biagio, M. Crocco, M. Cristani, Samuele Martelli, Vittorio Murino","doi":"10.1109/ICCV.2013.105","DOIUrl":"https://doi.org/10.1109/ICCV.2013.105","url":null,"abstract":"Capturing the essential characteristics of visual objects by considering how their features are inter-related is a recent philosophy of object classification. In this paper, we embed this principle in a novel image descriptor, dubbed Heterogeneous Auto-Similarities of Characteristics (HASC). HASC is applied to heterogeneous dense features maps, encoding linear relations by co variances and nonlinear associations through information-theoretic measures such as mutual information and entropy. In this way, highly complex structural information can be expressed in a compact, scale invariant and robust manner. The effectiveness of HASC is tested on many diverse detection and classification scenarios, considering objects, textures and pedestrians, on widely known benchmarks (Caltech-101, Brodatz, Daimler Multi-Cue). In all the cases, the results obtained with standard classifiers demonstrate the superiority of HASC with respect to the most adopted local feature descriptors nowadays, such as SIFT, HOG, LBP and feature co variances. In addition, HASC sets the state-of-the-art on the Brodatz texture dataset and the Daimler Multi-Cue pedestrian dataset, without exploiting ad-hoc sophisticated classifiers.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"48 1","pages":"809-816"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87623164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Fingerspelling Recognition with Semi-Markov Conditional Random Fields 基于半马尔可夫条件随机场的指纹拼写识别
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.192
Taehwan Kim, Gregory Shakhnarovich, Karen Livescu
Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.
手势序列的识别通常是一个非常困难的问题,但在某些领域,通过利用该领域的“语法”可以减轻难度。其中一个受语法约束的手势序列域就是手语。在本文中,我们研究了手指拼写识别的情况,由于手指的快速,小的运动,这可能是非常具有挑战性的。大多数先前的工作都假设了一个封闭的手指拼写单词词汇表,这里我们研究了更自然的开放词汇表情况,其中唯一的领域知识是可能的手指拼写字母及其序列的统计。我们开发了一种半马尔可夫条件模型方法,其中在视频片段及其相应的字母标签上定义特征函数。我们使用字母和语言手部形状特征的分类器,以及预期的运动轮廓,来定义分段特征函数。该方法将字母错误率(假设和正确字母序列之间的Levenshtein距离)从使用隐马尔可夫模型基线的16.3%提高到使用所提出的半马尔可夫模型的11.6%。
{"title":"Fingerspelling Recognition with Semi-Markov Conditional Random Fields","authors":"Taehwan Kim, Gregory Shakhnarovich, Karen Livescu","doi":"10.1109/ICCV.2013.192","DOIUrl":"https://doi.org/10.1109/ICCV.2013.192","url":null,"abstract":"Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"33 1","pages":"1521-1528"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84428479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Efficient Image Dehazing with Boundary Constraint and Contextual Regularization 基于边界约束和上下文正则化的高效图像去雾
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.82
Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang, Chunhong Pan
Images captured in foggy weather conditions often suffer from bad visibility. In this paper, we propose an efficient regularization method to remove hazes from a single input image. Our method benefits much from an exploration on the inherent boundary constraint on the transmission function. This constraint, combined with a weighted L1-norm based contextual regularization, is modeled into an optimization problem to estimate the unknown scene transmission. A quite efficient algorithm based on variable splitting is also presented to solve the problem. The proposed method requires only a few general assumptions and can restore a high-quality haze-free image with faithful colors and fine image details. Experimental results on a variety of haze images demonstrate the effectiveness and efficiency of the proposed method.
在多雾的天气条件下拍摄的图像往往受到能见度差的影响。在本文中,我们提出了一种有效的正则化方法来去除单个输入图像中的模糊。我们的方法得益于对传输函数固有边界约束的探索。该约束与基于l1范数的加权上下文正则化相结合,被建模为一个优化问题来估计未知场景传输。提出了一种基于变量分割的高效算法来解决这一问题。该方法只需要几个一般的假设,就可以恢复出具有忠实色彩和精细图像细节的高质量无雾图像。在多种雾霾图像上的实验结果验证了该方法的有效性和高效性。
{"title":"Efficient Image Dehazing with Boundary Constraint and Contextual Regularization","authors":"Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang, Chunhong Pan","doi":"10.1109/ICCV.2013.82","DOIUrl":"https://doi.org/10.1109/ICCV.2013.82","url":null,"abstract":"Images captured in foggy weather conditions often suffer from bad visibility. In this paper, we propose an efficient regularization method to remove hazes from a single input image. Our method benefits much from an exploration on the inherent boundary constraint on the transmission function. This constraint, combined with a weighted L1-norm based contextual regularization, is modeled into an optimization problem to estimate the unknown scene transmission. A quite efficient algorithm based on variable splitting is also presented to solve the problem. The proposed method requires only a few general assumptions and can restore a high-quality haze-free image with faithful colors and fine image details. Experimental results on a variety of haze images demonstrate the effectiveness and efficiency of the proposed method.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"11 1","pages":"617-624"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88166131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 897
Coupling Alignments with Recognition for Still-to-Video Face Recognition 静态到视频人脸识别的耦合对齐与识别
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.409
Zhiwu Huang, Xiaowei Zhao, S. Shan, Ruiping Wang, Xilin Chen
The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, low face resolutions, varying head pose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of `best quality' from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks-quality alignment, geometric alignment and face recognition-can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
静止到视频(S2V)人脸识别系统通常需要将在无约束条件下捕获的低质量视频中的人脸与高质量的静止人脸图像进行匹配,这是非常具有挑战性的,因为噪声、图像模糊、低人脸分辨率、不同的头部姿势、复杂的照明和对齐困难。为了解决这个问题,一种解决方案是从视频中选择“最佳质量”的帧(下文称为质量对齐)。同时,所选框架中的面也应与库中离线的静止面进行几何对齐。在本文中,我们发现质量对齐、几何对齐和人脸识别这三个任务之间的相互作用是相互受益的,因此应该共同进行。考虑到这一点,我们提出了一种耦合对齐与识别(CAR)方法,通过在统一框架中使用低秩正则化稀疏表示来紧密耦合这些任务。该方法在增广拉格朗日乘法器例程中通过联合优化使这三个任务相互促进。在两个具有挑战性的S2V数据集上进行的大量实验表明,我们的方法优于最先进的方法。
{"title":"Coupling Alignments with Recognition for Still-to-Video Face Recognition","authors":"Zhiwu Huang, Xiaowei Zhao, S. Shan, Ruiping Wang, Xilin Chen","doi":"10.1109/ICCV.2013.409","DOIUrl":"https://doi.org/10.1109/ICCV.2013.409","url":null,"abstract":"The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, low face resolutions, varying head pose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of `best quality' from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks-quality alignment, geometric alignment and face recognition-can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"50 1","pages":"3296-3303"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86659637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion 无论你在哪里:目标运动下多视角头部姿势分类的灵活图形引导多任务学习
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.150
Yan Yan, E. Ricci, Subramanian Ramanathan, O. Lanz, N. Sebe
We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearance wise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the target's position using a person tracker, the appropriate region specific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.
我们提出了一种新的多任务学习框架(FEGA-MTL),用于对在多个大视场监控摄像机监控的环境中自由移动的人的头部姿势进行分类。随着目标(人)的移动,由于摄像机视角和比例的影响,人脸的畸变严重影响了传统的头部姿势分类方法的性能。FEGA-MTL在密集的均匀空间网格上运行,并学习跨分区的外观关系以及给定头部姿势的分区特定外观变化,以构建特定区域的分类器。在(i)基于相机几何形状的网格分区和(ii)头部姿势类别的网格分区之间先验建模外观相似性的两个图的指导下,学习器有效地对外观相关的网格分区进行聚类,以获得最优分区。对于姿态分类,在使用人跟踪器确定目标位置后,调用适当的特定区域分类器。实验证明,FEGA-MTL在训练数据较少的情况下实现了最先进的分类。
{"title":"No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion","authors":"Yan Yan, E. Ricci, Subramanian Ramanathan, O. Lanz, N. Sebe","doi":"10.1109/ICCV.2013.150","DOIUrl":"https://doi.org/10.1109/ICCV.2013.150","url":null,"abstract":"We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearance wise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the target's position using a person tracker, the appropriate region specific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"57 1","pages":"1177-1184"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87041528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 118
Learning to Predict Gaze in Egocentric Video 学习在自我中心视频中预测凝视
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.399
Yin Li, A. Fathi, James M. Rehg
We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer's behaviors. Specifically, we compute the camera wearer's head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.
我们提出了一个以自我为中心的视频的凝视预测模型,该模型利用了相机佩戴者行为中存在的隐含线索。具体来说,我们从视频中计算相机佩戴者的头部运动和手的位置,并将它们结合起来估计眼睛在看哪里。我们进一步将凝视的动态行为建模,特别是注视,作为潜在变量来改进凝视预测。我们的凝视预测结果在公开可用的以自我为中心的视觉数据集上大大优于最先进的算法。此外,我们证明,通过将我们的凝视预测插入到最先进的方法中,我们在识别日常动作和分割前景对象方面获得了显著的性能提升。
{"title":"Learning to Predict Gaze in Egocentric Video","authors":"Yin Li, A. Fathi, James M. Rehg","doi":"10.1109/ICCV.2013.399","DOIUrl":"https://doi.org/10.1109/ICCV.2013.399","url":null,"abstract":"We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer's behaviors. Specifically, we compute the camera wearer's head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"27 1","pages":"3216-3223"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85358397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 240
Efficient and Robust Large-Scale Rotation Averaging 高效鲁棒的大规模旋转平均
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.70
Avishek Chatterjee, V. Govindu
In this paper we address the problem of robust and efficient averaging of relative 3D rotations. Apart from having an interesting geometric structure, robust rotation averaging addresses the need for a good initialization for large scale optimization used in structure-from-motion pipelines. Such pipelines often use unstructured image datasets harvested from the internet thereby requiring an initialization method that is robust to outliers. Our approach works on the Lie group structure of 3D rotations and solves the problem of large-scale robust rotation averaging in two ways. Firstly, we use modern ℓ1 optimizers to carry out robust averaging of relative rotations that is efficient, scalable and robust to outliers. In addition, we also develop a two step method that uses the ℓ1 solution as an initialisation for an iteratively reweighted least squares (IRLS) approach. These methods achieve excellent results on large-scale, real world datasets and significantly outperform existing methods, i.e. the state-of-the-art discrete-continuous optimization method of [3] as well as the Weiszfeld method of [8]. We demonstrate the efficacy of our method on two large scale real world datasets and also provide the results of the two aforementioned methods for comparison.
本文研究了相对三维旋转的鲁棒和高效平均问题。除了具有有趣的几何结构外,鲁棒旋转平均解决了在运动结构管道中使用的大规模优化的良好初始化需求。这种管道通常使用从互联网上获取的非结构化图像数据集,因此需要一种对异常值具有鲁棒性的初始化方法。我们的方法适用于三维旋转的李群结构,并从两方面解决了大规模鲁棒旋转平均问题。首先,我们使用现代的1优化器进行相对旋转的鲁棒平均,该平均具有高效,可扩展和对异常值的鲁棒性。此外,我们还开发了一种两步方法,该方法使用l1解作为迭代加权最小二乘(IRLS)方法的初始化。这些方法在大规模的真实世界数据集上取得了优异的效果,并且明显优于现有的方法,即最先进的离散-连续优化方法[3]和Weiszfeld方法[8]。我们在两个大规模的真实世界数据集上证明了我们的方法的有效性,并提供了上述两种方法的结果进行比较。
{"title":"Efficient and Robust Large-Scale Rotation Averaging","authors":"Avishek Chatterjee, V. Govindu","doi":"10.1109/ICCV.2013.70","DOIUrl":"https://doi.org/10.1109/ICCV.2013.70","url":null,"abstract":"In this paper we address the problem of robust and efficient averaging of relative 3D rotations. Apart from having an interesting geometric structure, robust rotation averaging addresses the need for a good initialization for large scale optimization used in structure-from-motion pipelines. Such pipelines often use unstructured image datasets harvested from the internet thereby requiring an initialization method that is robust to outliers. Our approach works on the Lie group structure of 3D rotations and solves the problem of large-scale robust rotation averaging in two ways. Firstly, we use modern ℓ1 optimizers to carry out robust averaging of relative rotations that is efficient, scalable and robust to outliers. In addition, we also develop a two step method that uses the ℓ1 solution as an initialisation for an iteratively reweighted least squares (IRLS) approach. These methods achieve excellent results on large-scale, real world datasets and significantly outperform existing methods, i.e. the state-of-the-art discrete-continuous optimization method of [3] as well as the Weiszfeld method of [8]. We demonstrate the efficacy of our method on two large scale real world datasets and also provide the results of the two aforementioned methods for comparison.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"11 1","pages":"521-528"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85786140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 217
High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination 在未校准的自然照明下,从单个RGB-D图像获得高质量形状
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.204
Yudeog Han, Joon-Young Lee, In-So Kweon
We present a novel framework to estimate detailed shape of diffuse objects with uniform albedo from a single RGB-D image. To estimate accurate lighting in natural illumination environment, we introduce a general lighting model consisting of two components: global and local models. The global lighting model is estimated from the RGB-D input using the low-dimensional characteristic of a diffuse reflectance model. The local lighting model represents spatially varying illumination and it is estimated by using the smoothly-varying characteristic of illumination. With both the global and local lighting model, we can estimate complex lighting variations in uncontrolled natural illumination conditions accurately. For high quality shape capture, a shape-from-shading approach is applied with the estimated lighting model. Since the entire process is done with a single RGB-D input, our method is capable of capturing the high quality shape details of a dynamic object under natural illumination. Experimental results demonstrate the feasibility and effectiveness of our method that dramatically improves shape details of the rough depth input.
我们提出了一种新的框架来估计具有均匀反照率的漫射物体的详细形状从单个RGB-D图像。为了估计自然光照环境下的精确照明,我们引入了一个由全局模型和局部模型两部分组成的通用照明模型。利用漫反射模型的低维特性,从RGB-D输入估计全局照明模型。局部照明模型表示空间变化的照度,并利用照度的平滑变化特性对其进行估计。利用全局和局部光照模型,我们可以准确地估计自然光照不受控制条件下的复杂光照变化。为了获得高质量的形状捕获,我们采用了一种基于阴影的形状捕获方法。由于整个过程是用一个RGB-D输入完成的,我们的方法能够在自然光照下捕获动态物体的高质量形状细节。实验结果证明了该方法的可行性和有效性,显著改善了粗深度输入的形状细节。
{"title":"High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination","authors":"Yudeog Han, Joon-Young Lee, In-So Kweon","doi":"10.1109/ICCV.2013.204","DOIUrl":"https://doi.org/10.1109/ICCV.2013.204","url":null,"abstract":"We present a novel framework to estimate detailed shape of diffuse objects with uniform albedo from a single RGB-D image. To estimate accurate lighting in natural illumination environment, we introduce a general lighting model consisting of two components: global and local models. The global lighting model is estimated from the RGB-D input using the low-dimensional characteristic of a diffuse reflectance model. The local lighting model represents spatially varying illumination and it is estimated by using the smoothly-varying characteristic of illumination. With both the global and local lighting model, we can estimate complex lighting variations in uncontrolled natural illumination conditions accurately. For high quality shape capture, a shape-from-shading approach is applied with the estimated lighting model. Since the entire process is done with a single RGB-D input, our method is capable of capturing the high quality shape details of a dynamic object under natural illumination. Experimental results demonstrate the feasibility and effectiveness of our method that dramatically improves shape details of the rough depth input.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"47 1","pages":"1617-1624"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86319904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Supervised Binary Hash Code Learning with Jensen Shannon Divergence 基于Jensen Shannon散度的监督二进制哈希码学习
Pub Date : 2013-12-01 DOI: 10.1109/ICCV.2013.325
Lixin Fan
This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.
本文提出了在统计学习框架内学习二进制哈希码的方法,推导了不同形式哈希函数的贝叶斯决策错误概率的上界,并给出了上界收敛性的严格证明。因此,无论原始算法是无监督的还是有监督的,最小化这样的上界都会导致现有哈希码学习算法的性能提高。本文还演示了一种快速哈希编码方法,该方法利用简单的二进制测试来实现与基于投影的方法相比编码速度的数量级改进。
{"title":"Supervised Binary Hash Code Learning with Jensen Shannon Divergence","authors":"Lixin Fan","doi":"10.1109/ICCV.2013.325","DOIUrl":"https://doi.org/10.1109/ICCV.2013.325","url":null,"abstract":"This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"07 1","pages":"2616-2623"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86328050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2013 IEEE International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1