首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Exploiting Spatial Structure for Localizing Manipulated Image Regions 利用空间结构定位被操纵图像区域
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.532
Jawadul H. Bappy, A. Roy-Chowdhury, Jason Bunk, L. Nataraj, B. S. Manjunath
The advent of high-tech journaling tools facilitates an image to be manipulated in a way that can easily evade state-of-the-art image tampering detection approaches. The recent success of the deep learning approaches in different recognition tasks inspires us to develop a high confidence detection framework which can localize manipulated regions in an image. Unlike semantic object segmentation where all meaningful regions (objects) are segmented, the localization of image manipulation focuses only the possible tampered region which makes the problem even more challenging. In order to formulate the framework, we employ a hybrid CNN-LSTM model to capture discriminative features between manipulated and non-manipulated regions. One of the key properties of manipulated regions is that they exhibit discriminative features in boundaries shared with neighboring non-manipulated pixels. Our motivation is to learn the boundary discrepancy, i.e., the spatial structure, between manipulated and non-manipulated regions with the combination of LSTM and convolution layers. We perform end-to-end training of the network to learn the parameters through back-propagation given ground-truth mask information. The overall framework is capable of detecting different types of image manipulations, including copy-move, removal and splicing. Our model shows promising results in localizing manipulated regions, which is demonstrated through rigorous experimentation on three diverse datasets.
高科技日志工具的出现促进了图像被操纵的方式,可以很容易地逃避最先进的图像篡改检测方法。最近深度学习方法在不同识别任务中的成功激发了我们开发一种高置信度的检测框架,该框架可以定位图像中的被操纵区域。与语义对象分割不同,图像处理的定位只关注可能被篡改的区域,这使得问题更具挑战性。为了构建框架,我们采用了一种混合CNN-LSTM模型来捕获被操纵区域和非被操纵区域之间的判别特征。操纵区域的关键特性之一是它们在与相邻非操纵像素共享的边界上表现出区别性特征。我们的动机是通过LSTM和卷积层的结合来学习被操纵区域和非被操纵区域之间的边界差异,即空间结构。我们对网络进行端到端训练,通过给定真值掩码信息的反向传播来学习参数。整个框架能够检测不同类型的图像操作,包括复制-移动,移除和拼接。我们的模型在定位被操纵区域方面显示出有希望的结果,这是通过在三个不同数据集上的严格实验证明的。
{"title":"Exploiting Spatial Structure for Localizing Manipulated Image Regions","authors":"Jawadul H. Bappy, A. Roy-Chowdhury, Jason Bunk, L. Nataraj, B. S. Manjunath","doi":"10.1109/ICCV.2017.532","DOIUrl":"https://doi.org/10.1109/ICCV.2017.532","url":null,"abstract":"The advent of high-tech journaling tools facilitates an image to be manipulated in a way that can easily evade state-of-the-art image tampering detection approaches. The recent success of the deep learning approaches in different recognition tasks inspires us to develop a high confidence detection framework which can localize manipulated regions in an image. Unlike semantic object segmentation where all meaningful regions (objects) are segmented, the localization of image manipulation focuses only the possible tampered region which makes the problem even more challenging. In order to formulate the framework, we employ a hybrid CNN-LSTM model to capture discriminative features between manipulated and non-manipulated regions. One of the key properties of manipulated regions is that they exhibit discriminative features in boundaries shared with neighboring non-manipulated pixels. Our motivation is to learn the boundary discrepancy, i.e., the spatial structure, between manipulated and non-manipulated regions with the combination of LSTM and convolution layers. We perform end-to-end training of the network to learn the parameters through back-propagation given ground-truth mask information. The overall framework is capable of detecting different types of image manipulations, including copy-move, removal and splicing. Our model shows promising results in localizing manipulated regions, which is demonstrated through rigorous experimentation on three diverse datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"101 1","pages":"4980-4989"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87005583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 179
Delving into Salient Object Subitizing and Detection 突出对象的细分与检测研究
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.120
Shengfeng He, Jianbo Jiao, Xiaodan Zhang, Guoqiang Han, Rynson W. H. Lau
Subitizing (i.e., instant judgement on the number) and detection of salient objects are human inborn abilities. These two tasks influence each other in the human visual system. In this paper, we delve into the complementarity of these two tasks. We propose a multi-task deep neural network with weight prediction for salient object detection, where the parameters of an adaptive weight layer are dynamically determined by an auxiliary subitizing network. The numerical representation of salient objects is therefore embedded into the spatial representation. The proposed joint network can be trained end-to-end using backpropagation. Experiments show the proposed multi-task network outperforms existing multi-task architectures, and the auxiliary subitizing network provides strong guidance to salient object detection by reducing false positives and producing coherent saliency maps. Moreover, the proposed method is an unconstrained method able to handle images with/without salient objects. Finally, we show state-of-theart performance on different salient object datasets.
主观化(即对数字的即时判断)和发现显著物体是人类与生俱来的能力。这两项任务在人类视觉系统中相互影响。在本文中,我们深入研究了这两个任务的互补性。提出了一种具有权重预测的多任务深度神经网络用于显著目标检测,其中自适应权重层的参数由辅助子化网络动态确定。因此,突出对象的数字表示被嵌入到空间表示中。该联合网络可以通过反向传播进行端到端训练。实验表明,本文提出的多任务网络优于现有的多任务架构,并且辅助细分网络通过减少误报和生成连贯的显著性图,为显著性目标检测提供了强有力的指导。此外,该方法是一种无约束的方法,能够处理具有/不具有显著目标的图像。最后,我们展示了在不同显著对象数据集上的最新性能。
{"title":"Delving into Salient Object Subitizing and Detection","authors":"Shengfeng He, Jianbo Jiao, Xiaodan Zhang, Guoqiang Han, Rynson W. H. Lau","doi":"10.1109/ICCV.2017.120","DOIUrl":"https://doi.org/10.1109/ICCV.2017.120","url":null,"abstract":"Subitizing (i.e., instant judgement on the number) and detection of salient objects are human inborn abilities. These two tasks influence each other in the human visual system. In this paper, we delve into the complementarity of these two tasks. We propose a multi-task deep neural network with weight prediction for salient object detection, where the parameters of an adaptive weight layer are dynamically determined by an auxiliary subitizing network. The numerical representation of salient objects is therefore embedded into the spatial representation. The proposed joint network can be trained end-to-end using backpropagation. Experiments show the proposed multi-task network outperforms existing multi-task architectures, and the auxiliary subitizing network provides strong guidance to salient object detection by reducing false positives and producing coherent saliency maps. Moreover, the proposed method is an unconstrained method able to handle images with/without salient objects. Finally, we show state-of-theart performance on different salient object datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"1059-1067"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91115782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
2D-Driven 3D Object Detection in RGB-D Images RGB-D图像中2d驱动的3D目标检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.495
Jean Lahoud, Bernard Ghanem
In this paper, we present a technique that places 3D bounding boxes around objects in an RGB-D scene. Our approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques. We then use the 3D information to orient, place, and score bounding boxes around objects. We independently estimate the orientation for every object, using previous techniques that utilize normal information. Object locations and sizes in 3D are learned using a multilayer perceptron (MLP). In the final step, we refine our detections based on object class relations within a scene. When compared to state-of-the-art detection methods that operate almost entirely in the sparse 3D domain, extensive experiments on the well-known SUN RGB-D dataset [29] show that our proposed method is much faster (4.1s per image) in detecting 3D objects in RGB-D images and performs better (3 mAP higher) than the state-of-the-art method that is 4.7 times slower and comparably to the method that is two orders of magnitude slower. This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.
在本文中,我们提出了一种在RGB-D场景中围绕对象放置3D边界框的技术。我们的方法充分利用了二维信息,快速减少了三维的搜索空间,受益于最先进的二维目标检测技术。然后,我们使用3D信息来定位、放置和评分对象周围的边界框。我们独立地估计每个对象的方向,使用以前的技术,利用正常的信息。使用多层感知器(MLP)学习三维物体的位置和大小。在最后一步,我们基于场景中的对象类关系来改进我们的检测。与几乎完全在稀疏3D域中操作的最先进的检测方法相比,在著名的SUN RGB-D数据集上进行的大量实验[29]表明,我们提出的方法在检测RGB-D图像中的3D物体时要快得多(每幅图像4.1s),并且比最先进的方法性能更好(3 mAP更高),最先进的方法速度慢4.7倍,与慢两个数量级的方法相比。这项工作暗示了3D中2d驱动的物体检测应该进一步探索的想法,特别是在3D输入稀疏的情况下。
{"title":"2D-Driven 3D Object Detection in RGB-D Images","authors":"Jean Lahoud, Bernard Ghanem","doi":"10.1109/ICCV.2017.495","DOIUrl":"https://doi.org/10.1109/ICCV.2017.495","url":null,"abstract":"In this paper, we present a technique that places 3D bounding boxes around objects in an RGB-D scene. Our approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques. We then use the 3D information to orient, place, and score bounding boxes around objects. We independently estimate the orientation for every object, using previous techniques that utilize normal information. Object locations and sizes in 3D are learned using a multilayer perceptron (MLP). In the final step, we refine our detections based on object class relations within a scene. When compared to state-of-the-art detection methods that operate almost entirely in the sparse 3D domain, extensive experiments on the well-known SUN RGB-D dataset [29] show that our proposed method is much faster (4.1s per image) in detecting 3D objects in RGB-D images and performs better (3 mAP higher) than the state-of-the-art method that is 4.7 times slower and comparably to the method that is two orders of magnitude slower. This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"4632-4640"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73220590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
Mutual Enhancement for Detection of Multiple Logos in Sports Videos 体育视频中多个logo检测的相互增强
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.519
Yuan Liao, Xiaoqing Lu, Chengcui Zhang, Yongtao Wang, Zhi Tang
Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos. In a Fast-RCNN-based framework, we first introduce a homogeneity-enhanced re-ranking method by analyzing the characteristics of homogeneous logos in each frame, including type repetition, color consistency, and mutual exclusion. Different from conventional enhance mechanism that improves the weak proposals with the dominant proposals, our mutual method can also enhance the relatively significant proposals with weak proposals. Mutual enhancement is also included in our frame propagation mechanism that improves logo detection by utilizing the continuity of logos across frames. We use a tennis video dataset and an associated logo collection for detection evaluation. Experiments show that the proposed method outperforms existing methods with a higher accuracy.
检测体育视频中标识的出现频率和持续时间为赞助商提供了一种评估其广告效果的有效方法。然而,通用的目标检测方法并不能解决体育视频中的所有问题。在本文中,我们提出了一种相互增强的方法,可以通过从其他同时出现的徽标中获得的信息来提高对徽标的检测。在基于fast - rcnn的框架中,我们首先通过分析每帧中同质标识的特征,包括类型重复、颜色一致性和互斥性,引入了一种同质增强的重新排序方法。与传统的以优势提案提升弱提案的增强机制不同,我们的互增强方法也可以通过弱提案提升相对重要的提案。我们的帧传播机制中还包括相互增强,通过利用跨帧的徽标连续性来改进徽标检测。我们使用网球视频数据集和相关的徽标集合进行检测评估。实验表明,该方法优于现有方法,具有更高的精度。
{"title":"Mutual Enhancement for Detection of Multiple Logos in Sports Videos","authors":"Yuan Liao, Xiaoqing Lu, Chengcui Zhang, Yongtao Wang, Zhi Tang","doi":"10.1109/ICCV.2017.519","DOIUrl":"https://doi.org/10.1109/ICCV.2017.519","url":null,"abstract":"Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos. In a Fast-RCNN-based framework, we first introduce a homogeneity-enhanced re-ranking method by analyzing the characteristics of homogeneous logos in each frame, including type repetition, color consistency, and mutual exclusion. Different from conventional enhance mechanism that improves the weak proposals with the dominant proposals, our mutual method can also enhance the relatively significant proposals with weak proposals. Mutual enhancement is also included in our frame propagation mechanism that improves logo detection by utilizing the continuity of logos across frames. We use a tennis video dataset and an associated logo collection for detection evaluation. Experiments show that the proposed method outperforms existing methods with a higher accuracy.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"4856-4865"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73657557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Transferring Objects: Joint Inference of Container and Human Pose 转移对象:容器与人体姿态的联合推理
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.319
Hanqing Wang, Wei Liang, L. Yu
Transferring objects from one place to another place is a common task performed by human in daily life. During this process, it is usually intuitive for humans to choose an object as a proper container and to use an efficient pose to carry objects; yet, it is non-trivial for current computer vision and machine learning algorithms. In this paper, we propose an approach to jointly infer container and human pose for transferring objects by minimizing the costs associated both object and pose candidates. Our approach predicts which object to choose as a container while reasoning about how humans interact with physical surroundings to accomplish the task of transferring objects given visual input. In the learning phase, the presented method learns how humans make rational choices of containers and poses for transferring different objects, as well as the physical quantities required by the transfer task (e.g., compatibility between container and containee, energy cost of carrying pose) via a structured learning approach. In the inference phase, given a scanned 3D scene with different object candidates and a dictionary of human poses, our approach infers the best object as a container together with human pose for transferring a given object.
将物品从一个地方转移到另一个地方是人类日常生活中常见的任务。在这个过程中,人类通常会直观地选择一个物体作为合适的容器,并使用有效的姿势来携带物体;然而,对于当前的计算机视觉和机器学习算法来说,这是非常重要的。在本文中,我们提出了一种通过最小化对象和姿态候选相关成本来联合推断容器和人体姿态的方法。我们的方法预测选择哪个物体作为容器,同时推理人类如何与物理环境相互作用,以完成给定视觉输入的物体转移任务。在学习阶段,该方法通过结构化的学习方法学习人类如何理性选择容器和姿势来转移不同的物体,以及转移任务所需的物理量(如容器和被容器之间的兼容性,搬运姿势的能量成本)。在推理阶段,给定具有不同候选对象和人体姿势字典的扫描3D场景,我们的方法推断出最佳对象作为容器以及用于转移给定对象的人体姿势。
{"title":"Transferring Objects: Joint Inference of Container and Human Pose","authors":"Hanqing Wang, Wei Liang, L. Yu","doi":"10.1109/ICCV.2017.319","DOIUrl":"https://doi.org/10.1109/ICCV.2017.319","url":null,"abstract":"Transferring objects from one place to another place is a common task performed by human in daily life. During this process, it is usually intuitive for humans to choose an object as a proper container and to use an efficient pose to carry objects; yet, it is non-trivial for current computer vision and machine learning algorithms. In this paper, we propose an approach to jointly infer container and human pose for transferring objects by minimizing the costs associated both object and pose candidates. Our approach predicts which object to choose as a container while reasoning about how humans interact with physical surroundings to accomplish the task of transferring objects given visual input. In the learning phase, the presented method learns how humans make rational choices of containers and poses for transferring different objects, as well as the physical quantities required by the transfer task (e.g., compatibility between container and containee, energy cost of carrying pose) via a structured learning approach. In the inference phase, given a scanned 3D scene with different object candidates and a dictionary of human poses, our approach infers the best object as a container together with human pose for transferring a given object.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"156 1","pages":"2952-2960"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77483028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding AnnArbor:使用树影编码的近似近邻
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.523
Artem Babenko, V. Lempitsky
To compress large datasets of high-dimensional descriptors, modern quantization schemes learn multiple codebooks and then represent individual descriptors as combinations of codewords. Once the codebooks are learned, these schemes encode descriptors independently. In contrast to that, we present a new coding scheme that arranges dataset descriptors into a set of arborescence graphs, and then encodes non-root descriptors by quantizing their displacements with respect to their parent nodes. By optimizing the structure of arborescences, our coding scheme can decrease the quantization error considerably, while incurring only minimal overhead on the memory footprint and the speed of nearest neighbor search in the compressed dataset compared to the independent quantization. The advantage of the proposed scheme is demonstrated in a series of experiments with datasets of SIFT and deep descriptors.
为了压缩高维描述符的大型数据集,现代量化方案学习多个码本,然后将单个描述符表示为码字的组合。一旦了解了码本,这些方案就独立地对描述符进行编码。与此相反,我们提出了一种新的编码方案,该方案将数据集描述符排列成一组树形图,然后通过量化它们相对于父节点的位移来编码非根描述符。通过优化树形序列的结构,我们的编码方案可以大大减少量化误差,同时与独立量化相比,压缩数据集中的内存占用和最近邻搜索速度只产生最小的开销。在SIFT和深度描述符的数据集上进行了一系列实验,证明了该方法的优越性。
{"title":"AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding","authors":"Artem Babenko, V. Lempitsky","doi":"10.1109/ICCV.2017.523","DOIUrl":"https://doi.org/10.1109/ICCV.2017.523","url":null,"abstract":"To compress large datasets of high-dimensional descriptors, modern quantization schemes learn multiple codebooks and then represent individual descriptors as combinations of codewords. Once the codebooks are learned, these schemes encode descriptors independently. In contrast to that, we present a new coding scheme that arranges dataset descriptors into a set of arborescence graphs, and then encodes non-root descriptors by quantizing their displacements with respect to their parent nodes. By optimizing the structure of arborescences, our coding scheme can decrease the quantization error considerably, while incurring only minimal overhead on the memory footprint and the speed of nearest neighbor search in the compressed dataset compared to the independent quantization. The advantage of the proposed scheme is demonstrated in a series of experiments with datasets of SIFT and deep descriptors.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"121 1","pages":"4895-4903"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76689321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection 重度遮挡行人检测中局部检测器的多标签学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.377
Chunluan Zhou, Junsong Yuan
Detecting pedestrians that are partially occluded remains a challenging problem due to variations and uncertainties of partial occlusion patterns. Following a commonly used framework of handling partial occlusions by part detection, we propose a multi-label learning approach to jointly learn part detectors to capture partial occlusion patterns. The part detectors share a set of decision trees via boosting to exploit part correlations and also reduce the computational cost of applying these part detectors. The learned decision trees capture the overall distribution of all the parts. When used as a pedestrian detector individually, our part detectors learned jointly show better performance than their counterparts learned separately in different occlusion situations. The learned part detectors can be further integrated to better detect partially occluded pedestrians. Experiments on the Caltech dataset show state-of-the-art performance of our approach for detecting heavily occluded pedestrians.
由于部分遮挡模式的变化和不确定性,检测部分遮挡的行人仍然是一个具有挑战性的问题。根据常用的通过部分检测处理部分遮挡的框架,我们提出了一种多标签学习方法来联合学习部分检测器以捕获部分遮挡模式。零件检测器通过增强来共享一组决策树,利用零件的相关性,降低了应用这些零件检测器的计算成本。学习的决策树捕获所有部件的总体分布。当单独用作行人检测器时,我们联合学习的部分检测器在不同遮挡情况下比单独学习的部分检测器表现出更好的性能。将学习到的部分检测器进一步集成,可以更好地检测部分遮挡的行人。在加州理工学院数据集上的实验表明,我们的方法在检测严重遮挡的行人方面具有最先进的性能。
{"title":"Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection","authors":"Chunluan Zhou, Junsong Yuan","doi":"10.1109/ICCV.2017.377","DOIUrl":"https://doi.org/10.1109/ICCV.2017.377","url":null,"abstract":"Detecting pedestrians that are partially occluded remains a challenging problem due to variations and uncertainties of partial occlusion patterns. Following a commonly used framework of handling partial occlusions by part detection, we propose a multi-label learning approach to jointly learn part detectors to capture partial occlusion patterns. The part detectors share a set of decision trees via boosting to exploit part correlations and also reduce the computational cost of applying these part detectors. The learned decision trees capture the overall distribution of all the parts. When used as a pedestrian detector individually, our part detectors learned jointly show better performance than their counterparts learned separately in different occlusion situations. The learned part detectors can be further integrated to better detect partially occluded pedestrians. Experiments on the Caltech dataset show state-of-the-art performance of our approach for detecting heavily occluded pedestrians.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"177 1","pages":"3506-3515"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77369814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Transformed Low-Rank Model for Line Pattern Noise Removal 变换的低秩线性模式去噪
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.191
Yi Chang, Luxin Yan, Sheng Zhong
This paper addresses the problem of line pattern noise removal from a single image, such as rain streak, hyperspectral stripe and so on. Most of the previous methods model the line pattern noise in original image domain, which fail to explicitly exploit the directional characteristic, thus resulting in a redundant subspace with poor representation ability for those line pattern noise. To achieve a compact subspace for the line pattern structure, in this work, we incorporate a transformation into the image decomposition model so that maps the input image to a domain where the line pattern appearance has an extremely distinct low-rank structure, which naturally allows us to enforce a low-rank prior to extract the line pattern streak/stripe from the noisy image. Moreover, the random noise is usually mixed up with the line pattern noise, which makes the challenging problem much more difficult. While previous methods resort to the spectral or temporal correlation of the multi-images, we give a detailed analysis between the noisy and clean image in both local gradient and nonlocal domain, and propose a compositional directional total variational and low-rank prior for the image layer, thus to simultaneously accommodate both types of noise. The proposed method has been evaluated on two different tasks, including remote sensing image mixed random-stripe noise removal and rain streak removal, all of which obtain very impressive performances.
本文研究了单幅图像的线纹噪声去除问题,如雨条纹、高光谱条纹等。以往的方法大多是在原始图像域对线纹噪声进行建模,但没有明确地利用方向特征,导致多余的子空间对线纹噪声的表示能力较差。为了实现线条图案结构的紧凑子空间,在这项工作中,我们将一个转换合并到图像分解模型中,以便将输入图像映射到线条图案外观具有非常明显的低秩结构的域,这自然允许我们在从噪声图像中提取线条图案条纹/条纹之前强制执行低秩。此外,随机噪声通常与线形噪声混合在一起,这使得具有挑战性的问题变得更加困难。与以往的方法依赖于多幅图像的光谱或时间相关性不同,本文在局部梯度域和非局部梯度域对噪声图像和干净图像进行了详细的分析,并提出了图像层的合成方向全变分和低秩先验,从而同时适应两种类型的噪声。本文提出的方法在遥感图像混合随机条纹噪声去除和雨纹去除两个不同的任务上进行了测试,均获得了令人印象深刻的效果。
{"title":"Transformed Low-Rank Model for Line Pattern Noise Removal","authors":"Yi Chang, Luxin Yan, Sheng Zhong","doi":"10.1109/ICCV.2017.191","DOIUrl":"https://doi.org/10.1109/ICCV.2017.191","url":null,"abstract":"This paper addresses the problem of line pattern noise removal from a single image, such as rain streak, hyperspectral stripe and so on. Most of the previous methods model the line pattern noise in original image domain, which fail to explicitly exploit the directional characteristic, thus resulting in a redundant subspace with poor representation ability for those line pattern noise. To achieve a compact subspace for the line pattern structure, in this work, we incorporate a transformation into the image decomposition model so that maps the input image to a domain where the line pattern appearance has an extremely distinct low-rank structure, which naturally allows us to enforce a low-rank prior to extract the line pattern streak/stripe from the noisy image. Moreover, the random noise is usually mixed up with the line pattern noise, which makes the challenging problem much more difficult. While previous methods resort to the spectral or temporal correlation of the multi-images, we give a detailed analysis between the noisy and clean image in both local gradient and nonlocal domain, and propose a compositional directional total variational and low-rank prior for the image layer, thus to simultaneously accommodate both types of noise. The proposed method has been evaluated on two different tasks, including remote sensing image mixed random-stripe noise removal and rain streak removal, all of which obtain very impressive performances.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"649 1","pages":"1735-1743"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77670966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 125
Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences 基于样本外图像序列深度特征分布的可穿戴相机流总结与分类
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.464
Alessandro Penna, Sadegh Mohammadi, N. Jojic, Vittorio Murino
A popular approach to training classifiers of new image classes is to use lower levels of a pre-trained feed-forward neural network and retrain only the top. Thus, most layers simply serve as highly nonlinear feature extractors. While these features were found useful for classifying a variety of scenes and objects, previous work also demonstrated unusual levels of sensitivity to the input especially for images which are veering too far away from the training distribution. This can lead to surprising results as an imperceptible change in an image can be enough to completely change the predicted class. This occurs in particular in applications involving personal data, typically acquired with wearable cameras (e.g., visual lifelogs), where the problem is also made more complex by the dearth of new labeled training data that make supervised learning with deep models difficult. To alleviate these problems, in this paper we propose a new generative model that captures the feature distribution in new data. Its latent space then becomes more representative of the new data, while still retaining the generalization properties. In particular, we use constrained Markov walks over a counting grid for modeling image sequences, which not only yield good latent representations, but allow for excellent classification with only a handful of labeled training examples of the new scenes or objects, a scenario typical in lifelogging applications.
训练新图像分类器的一种流行方法是使用预训练的前馈神经网络的较低层次,只重新训练顶部。因此,大多数层只是作为高度非线性的特征提取器。虽然这些特征被发现对各种场景和对象的分类很有用,但以前的工作也证明了对输入的不同寻常的敏感性,特别是对于偏离训练分布太远的图像。这可能会导致令人惊讶的结果,因为图像中难以察觉的变化足以完全改变预测的类别。这尤其发生在涉及个人数据的应用中,通常是通过可穿戴相机(例如,视觉生活日志)获取的,由于缺乏新的标记训练数据,使得深度模型的监督学习变得困难,问题也变得更加复杂。为了解决这些问题,本文提出了一种新的生成模型来捕捉新数据中的特征分布。然后它的潜在空间变得更能代表新数据,同时仍然保留泛化属性。特别是,我们在计数网格上使用约束马尔可夫行走来对图像序列进行建模,这不仅产生了良好的潜在表示,而且允许仅使用少数标记的新场景或对象的训练示例进行出色的分类,这是生活日志应用中典型的场景。
{"title":"Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences","authors":"Alessandro Penna, Sadegh Mohammadi, N. Jojic, Vittorio Murino","doi":"10.1109/ICCV.2017.464","DOIUrl":"https://doi.org/10.1109/ICCV.2017.464","url":null,"abstract":"A popular approach to training classifiers of new image classes is to use lower levels of a pre-trained feed-forward neural network and retrain only the top. Thus, most layers simply serve as highly nonlinear feature extractors. While these features were found useful for classifying a variety of scenes and objects, previous work also demonstrated unusual levels of sensitivity to the input especially for images which are veering too far away from the training distribution. This can lead to surprising results as an imperceptible change in an image can be enough to completely change the predicted class. This occurs in particular in applications involving personal data, typically acquired with wearable cameras (e.g., visual lifelogs), where the problem is also made more complex by the dearth of new labeled training data that make supervised learning with deep models difficult. To alleviate these problems, in this paper we propose a new generative model that captures the feature distribution in new data. Its latent space then becomes more representative of the new data, while still retaining the generalization properties. In particular, we use constrained Markov walks over a counting grid for modeling image sequences, which not only yield good latent representations, but allow for excellent classification with only a handful of labeled training examples of the new scenes or objects, a scenario typical in lifelogging applications.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"4336-4344"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81916672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
PolyFit: Polygonal Surface Reconstruction from Point Clouds PolyFit:从点云的多边形表面重建
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.258
L. Nan, Peter Wonka
We propose a novel framework for reconstructing lightweight polygonal surfaces from point clouds. Unlike traditional methods that focus on either extracting good geometric primitives or obtaining proper arrangements of primitives, the emphasis of this work lies in intersecting the primitives (planes only) and seeking for an appropriate combination of them to obtain a manifold polygonal surface model without boundary.,,We show that reconstruction from point clouds can be cast as a binary labeling problem. Our method is based on a hypothesizing and selection strategy. We first generate a reasonably large set of face candidates by intersecting the extracted planar primitives. Then an optimal subset of the candidate faces is selected through optimization. Our optimization is based on a binary linear programming formulation under hard constraints that enforce the final polygonal surface model to be manifold and watertight. Experiments on point clouds from various sources demonstrate that our method can generate lightweight polygonal surface models of arbitrary piecewise planar objects. Besides, our method is capable of recovering sharp features and is robust to noise, outliers, and missing data.
我们提出了一种从点云重建轻量级多边形表面的新框架。与传统方法的重点在于提取好的几何基元或获得合适的基元排列不同,本工作的重点在于将基元(仅平面)相交并寻求它们的适当组合以获得无边界的流形多边形曲面模型。我们证明了点云的重建可以作为一个二值标记问题。我们的方法是基于假设和选择策略。我们首先通过交叉提取的平面基元生成一个相当大的候选人脸集。然后通过优化选择候选人脸的最优子集。我们的优化是基于硬约束下的二进制线性规划公式,强制最终多边形表面模型是流形和水密的。在不同来源的点云上的实验表明,我们的方法可以生成任意分段平面物体的轻量级多边形表面模型。此外,我们的方法能够恢复尖锐的特征,并且对噪声、异常值和缺失数据具有鲁棒性。
{"title":"PolyFit: Polygonal Surface Reconstruction from Point Clouds","authors":"L. Nan, Peter Wonka","doi":"10.1109/ICCV.2017.258","DOIUrl":"https://doi.org/10.1109/ICCV.2017.258","url":null,"abstract":"We propose a novel framework for reconstructing lightweight polygonal surfaces from point clouds. Unlike traditional methods that focus on either extracting good geometric primitives or obtaining proper arrangements of primitives, the emphasis of this work lies in intersecting the primitives (planes only) and seeking for an appropriate combination of them to obtain a manifold polygonal surface model without boundary.,,We show that reconstruction from point clouds can be cast as a binary labeling problem. Our method is based on a hypothesizing and selection strategy. We first generate a reasonably large set of face candidates by intersecting the extracted planar primitives. Then an optimal subset of the candidate faces is selected through optimization. Our optimization is based on a binary linear programming formulation under hard constraints that enforce the final polygonal surface model to be manifold and watertight. Experiments on point clouds from various sources demonstrate that our method can generate lightweight polygonal surface models of arbitrary piecewise planar objects. Besides, our method is capable of recovering sharp features and is robust to noise, outliers, and missing data.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"34 1","pages":"2372-2380"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83117546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1