首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Online Video Object Detection Using Association LSTM 基于关联LSTM的在线视频目标检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.257
Yongyi Lu, Cewu Lu, Chi-Keung Tang
Video object detection is a fundamental tool for many applications. Since direct application of image-based object detection cannot leverage the rich temporal information inherent in video data, we advocate to the detection of long-range video object pattern. While the Long Short-Term Memory (LSTM) has been the de facto choice for such detection, currently LSTM cannot fundamentally model object association between consecutive frames. In this paper, we propose the association LSTM to address this fundamental association problem. Association LSTM not only regresses and classifiy directly on object locations and categories but also associates features to represent each output object. By minimizing the matching error between these features, we learn how to associate objects in two consecutive frames. Additionally, our method works in an online manner, which is important for most video tasks. Compared to the traditional video object detection methods, our approach outperforms them on standard video datasets.
视频目标检测是许多应用的基本工具。由于直接应用基于图像的目标检测不能充分利用视频数据中所固有的丰富的时间信息,我们主张对远程视频目标模式进行检测。虽然长短期记忆(LSTM)已经成为这种检测的实际选择,但目前LSTM还不能从根本上对连续帧之间的对象关联进行建模。在本文中,我们提出了关联LSTM来解决这个基本的关联问题。关联LSTM不仅直接对对象的位置和类别进行回归和分类,而且还将特征关联起来以表示每个输出对象。通过最小化这些特征之间的匹配误差,我们学习如何将两个连续帧中的对象关联起来。此外,我们的方法以在线方式工作,这对大多数视频任务都很重要。与传统的视频目标检测方法相比,我们的方法在标准视频数据集上优于传统的视频目标检测方法。
{"title":"Online Video Object Detection Using Association LSTM","authors":"Yongyi Lu, Cewu Lu, Chi-Keung Tang","doi":"10.1109/ICCV.2017.257","DOIUrl":"https://doi.org/10.1109/ICCV.2017.257","url":null,"abstract":"Video object detection is a fundamental tool for many applications. Since direct application of image-based object detection cannot leverage the rich temporal information inherent in video data, we advocate to the detection of long-range video object pattern. While the Long Short-Term Memory (LSTM) has been the de facto choice for such detection, currently LSTM cannot fundamentally model object association between consecutive frames. In this paper, we propose the association LSTM to address this fundamental association problem. Association LSTM not only regresses and classifiy directly on object locations and categories but also associates features to represent each output object. By minimizing the matching error between these features, we learn how to associate objects in two consecutive frames. Additionally, our method works in an online manner, which is important for most video tasks. Compared to the traditional video object detection methods, our approach outperforms them on standard video datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"107 1","pages":"2363-2371"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73700276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
From Square Pieces to Brick Walls: The Next Challenge in Solving Jigsaw Puzzles 从方块到砖墙:解决拼图游戏的下一个挑战
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.434
Shir Gur, O. Ben-Shahar
Research into computational jigsaw puzzle solving, an emerging theoretical problem with numerous applications, has focused in recent years on puzzles that constitute square pieces only. In this paper we wish to extend the scientific scope of appearance-based puzzle solving and consider ’’brick wall” jigsaw puzzles – rectangular pieces who may have different sizes, and could be placed next to each other at arbitrary offset along their abutting edge – a more explicit configuration with properties of real world puzzles. We present the new challenges that arise in brick wall puzzles and address them in two stages. First we concentrate on the reconstruction of the puzzle (with or without missing pieces) assuming an oracle for offset assignments. We show that despite the increased complexity of the problem, under these conditions performance can be made comparable to the state-of-the-art in solving the simpler square piece puzzles, and thereby argue that solving brick wall puzzles may be reduced to finding the correct offset between two neighboring pieces. We then move on to focus on implementing the oracle computationally using a mixture of dissimilarity metrics and correlation matching. We show results on various brick wall puzzles and discuss how our work may start a new research path for the puzzle solving community.
计算拼图是一个新兴的理论问题,具有广泛的应用,近年来研究的重点是仅由正方形组成的拼图。在本文中,我们希望扩展基于外观的谜题解决的科学范围,并考虑“砖墙”拼图——大小不同的矩形块,可以沿着相邻边缘任意偏移放置在一起——一种具有现实世界谜题属性的更明确的配置。我们提出了砖墙谜题中出现的新挑战,并分两个阶段加以解决。首先,我们专注于拼图的重建(有或没有丢失的碎片),假设偏移分配的oracle。我们表明,尽管问题的复杂性增加了,但在这些条件下,在解决更简单的方块拼图时,性能可以与最先进的性能相媲美,从而认为解决砖墙拼图可以简化为找到两个相邻棋子之间的正确偏移量。然后,我们继续关注使用不相似度量和相关匹配的混合计算实现oracle。我们展示了各种砖墙谜题的结果,并讨论了我们的工作如何为谜题解决社区开辟新的研究路径。
{"title":"From Square Pieces to Brick Walls: The Next Challenge in Solving Jigsaw Puzzles","authors":"Shir Gur, O. Ben-Shahar","doi":"10.1109/ICCV.2017.434","DOIUrl":"https://doi.org/10.1109/ICCV.2017.434","url":null,"abstract":"Research into computational jigsaw puzzle solving, an emerging theoretical problem with numerous applications, has focused in recent years on puzzles that constitute square pieces only. In this paper we wish to extend the scientific scope of appearance-based puzzle solving and consider ’’brick wall” jigsaw puzzles – rectangular pieces who may have different sizes, and could be placed next to each other at arbitrary offset along their abutting edge – a more explicit configuration with properties of real world puzzles. We present the new challenges that arise in brick wall puzzles and address them in two stages. First we concentrate on the reconstruction of the puzzle (with or without missing pieces) assuming an oracle for offset assignments. We show that despite the increased complexity of the problem, under these conditions performance can be made comparable to the state-of-the-art in solving the simpler square piece puzzles, and thereby argue that solving brick wall puzzles may be reduced to finding the correct offset between two neighboring pieces. We then move on to focus on implementing the oracle computationally using a mixture of dissimilarity metrics and correlation matching. We show results on various brick wall puzzles and discuss how our work may start a new research path for the puzzle solving community.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"4049-4057"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74256485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
DualNet: Learn Complementary Features for Image Recognition DualNet:学习图像识别的互补功能
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.62
Saihui Hou, X. Liu, Zilei Wang
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier. The extractors of two subnetworks are placed side by side, which exactly form the feature extractor of DualNet. Then the two-stream features are aggregated to the final classifier for overall classification, while two auxiliary classifiers are appended behind the feature extractor of each subnetwork to make the separately learned features discriminative alone. The complementary constraint is imposed by weighting the three classifiers, which is indeed the key of DualNet. The corresponding training strategy is also proposed, consisting of iterative training and joint finetuning, to make the two subnetworks cooperate well with each other. Finally, DualNet based on the well-known CaffeNet, VGGNet, NIN and ResNet are thoroughly investigated and experimentally evaluated on multiple datasets including CIFAR-100, Stanford Dogs and UEC FOOD-100. The results demonstrate that DualNet can really help learn more accurate image representation, and thus result in higher accuracy for recognition. In particular, the performance on CIFAR-100 is state-of-the-art compared to the recent works.
在这项工作中,我们提出了一个名为Dual-Net的新框架,旨在学习更准确的图像识别表示。通过协调两个并行神经网络来学习互补特征,从而构建一个更广泛的网络。具体来说,我们从逻辑上将端到端深度卷积神经网络分为两个功能部分,即特征提取器和图像分类器。将两个子网的提取器并排放置,正好形成了双网的特征提取器。然后将两流特征聚合到最终分类器进行整体分类,同时在每个子网的特征提取器后面附加两个辅助分类器,使单独学习的特征单独判别。互补约束是通过对三个分类器进行加权来实现的,这也是DualNet的关键所在。提出了相应的训练策略,包括迭代训练和联合微调,使两个子网能够很好地相互配合。最后,在CIFAR-100、Stanford Dogs和UEC FOOD-100等多个数据集上,对基于知名的CaffeNet、VGGNet、NIN和ResNet的DualNet进行了深入研究和实验评估。结果表明,DualNet确实可以帮助学习更准确的图像表示,从而提高识别的准确性。特别是,与最近的产品相比,CIFAR-100的性能达到了最高水平。
{"title":"DualNet: Learn Complementary Features for Image Recognition","authors":"Saihui Hou, X. Liu, Zilei Wang","doi":"10.1109/ICCV.2017.62","DOIUrl":"https://doi.org/10.1109/ICCV.2017.62","url":null,"abstract":"In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier. The extractors of two subnetworks are placed side by side, which exactly form the feature extractor of DualNet. Then the two-stream features are aggregated to the final classifier for overall classification, while two auxiliary classifiers are appended behind the feature extractor of each subnetwork to make the separately learned features discriminative alone. The complementary constraint is imposed by weighting the three classifiers, which is indeed the key of DualNet. The corresponding training strategy is also proposed, consisting of iterative training and joint finetuning, to make the two subnetworks cooperate well with each other. Finally, DualNet based on the well-known CaffeNet, VGGNet, NIN and ResNet are thoroughly investigated and experimentally evaluated on multiple datasets including CIFAR-100, Stanford Dogs and UEC FOOD-100. The results demonstrate that DualNet can really help learn more accurate image representation, and thus result in higher accuracy for recognition. In particular, the performance on CIFAR-100 is state-of-the-art compared to the recent works.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"123 1","pages":"502-510"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74797457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution 小波- srnet:一种基于小波的多尺度人脸超分辨率CNN
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.187
Huaibo Huang, R. He, Zhenan Sun, T. Tan
Most modern face super-resolution methods resort to convolutional neural networks (CNN) to infer highresolution (HR) face images. When dealing with very low resolution (LR) images, the performance of these CNN based methods greatly degrades. Meanwhile, these methods tend to produce over-smoothed outputs and miss some textural details. To address these challenges, this paper presents a wavelet-based CNN approach that can ultra-resolve a very low resolution face image of 16 × 16 or smaller pixelsize to its larger version of multiple scaling factors (2×, 4×, 8× and even 16×) in a unified framework. Different from conventional CNN methods directly inferring HR images, our approach firstly learns to predict the LR’s corresponding series of HR’s wavelet coefficients before reconstructing HR images from them. To capture both global topology information and local texture details of human faces, we present a flexible and extensible convolutional neural network with three types of loss: wavelet prediction loss, texture loss and full-image loss. Extensive experiments demonstrate that the proposed approach achieves more appealing results both quantitatively and qualitatively than state-ofthe- art super-resolution methods.
大多数现代人脸超分辨率方法采用卷积神经网络(CNN)来推断高分辨率人脸图像。当处理极低分辨率(LR)图像时,这些基于CNN的方法的性能会大大下降。同时,这些方法容易产生过于平滑的输出,而忽略了一些纹理细节。为了解决这些挑战,本文提出了一种基于小波的CNN方法,该方法可以在统一的框架内将16× 16或更小像素的极低分辨率人脸图像超分辨率分解为多个缩放因子(2×、4×、8×甚至16×)的大版本。与传统的CNN直接推断HR图像的方法不同,我们的方法首先学习预测LR对应的HR小波系数序列,然后再从HR图像中重建HR图像。为了捕获人脸的全局拓扑信息和局部纹理细节,我们提出了一种灵活可扩展的卷积神经网络,该网络具有三种类型的损失:小波预测损失、纹理损失和全图损失。大量的实验表明,该方法在定量和定性上都比目前最先进的超分辨率方法取得了更令人满意的结果。
{"title":"Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution","authors":"Huaibo Huang, R. He, Zhenan Sun, T. Tan","doi":"10.1109/ICCV.2017.187","DOIUrl":"https://doi.org/10.1109/ICCV.2017.187","url":null,"abstract":"Most modern face super-resolution methods resort to convolutional neural networks (CNN) to infer highresolution (HR) face images. When dealing with very low resolution (LR) images, the performance of these CNN based methods greatly degrades. Meanwhile, these methods tend to produce over-smoothed outputs and miss some textural details. To address these challenges, this paper presents a wavelet-based CNN approach that can ultra-resolve a very low resolution face image of 16 × 16 or smaller pixelsize to its larger version of multiple scaling factors (2×, 4×, 8× and even 16×) in a unified framework. Different from conventional CNN methods directly inferring HR images, our approach firstly learns to predict the LR’s corresponding series of HR’s wavelet coefficients before reconstructing HR images from them. To capture both global topology information and local texture details of human faces, we present a flexible and extensible convolutional neural network with three types of loss: wavelet prediction loss, texture loss and full-image loss. Extensive experiments demonstrate that the proposed approach achieves more appealing results both quantitatively and qualitatively than state-ofthe- art super-resolution methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"55 1","pages":"1698-1706"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84490771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 336
SHaPE: A Novel Graph Theoretic Algorithm for Making Consensus-Based Decisions in Person Re-identification Systems SHaPE:一种新的基于共识的人再识别系统决策图论算法
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.127
Arko Barman, S. Shah
Person re-identification is a challenge in video-based surveillance where the goal is to identify the same person in different camera views. In recent years, many algorithms have been proposed that approach this problem by designing suitable feature representations for images of persons or by training appropriate distance metrics that learn to distinguish between images of different persons. Aggregating the results from multiple algorithms for person re-identification is a relatively less-explored area of research. In this paper, we formulate an algorithm that maps the ranking process in a person re-identification algorithm to a problem in graph theory. We then extend this formulation to allow for the use of results from multiple algorithms to make a consensus-based decision for the person re-identification problem. The algorithm is unsupervised and takes into account only the matching scores generated by multiple algorithms for creating a consensus of results. Further, we show how the graph theoretic problem can be solved by a two-step process. First, we obtain a rough estimate of the solution using a greedy algorithm. Then, we extend the construction of the proposed graph so that the problem can be efficiently solved by means of Ant Colony Optimization, a heuristic path-searching algorithm for complex graphs. While we present the algorithm in the context of person reidentification, it can potentially be applied to the general problem of ranking items based on a consensus of multiple sets of scores or metric values.
人员再识别是基于视频的监控中的一个挑战,其目标是在不同的摄像机视图中识别同一个人。近年来,已经提出了许多算法,通过为人物图像设计合适的特征表示或通过训练适当的距离度量来学习区分不同人物的图像来解决这个问题。将多种算法的结果聚合在一起进行人员再识别是一个相对较少探索的研究领域。本文提出了一种将人再识别算法中的排序过程映射到图论中的问题的算法。然后,我们扩展了这个公式,允许使用来自多个算法的结果来为人员重新识别问题做出基于共识的决策。该算法是无监督的,并且只考虑由多个算法生成的匹配分数来创建一致的结果。进一步,我们展示了如何通过一个两步过程来解决图论问题。首先,我们使用贪心算法得到解的粗略估计。然后,我们扩展了所提出的图的构造,使得问题可以通过蚁群优化算法(一种针对复杂图的启发式路径搜索算法)有效地求解。虽然我们在人员重新识别的上下文中提出了该算法,但它可以潜在地应用于基于多组分数或度量值的共识对项目进行排名的一般问题。
{"title":"SHaPE: A Novel Graph Theoretic Algorithm for Making Consensus-Based Decisions in Person Re-identification Systems","authors":"Arko Barman, S. Shah","doi":"10.1109/ICCV.2017.127","DOIUrl":"https://doi.org/10.1109/ICCV.2017.127","url":null,"abstract":"Person re-identification is a challenge in video-based surveillance where the goal is to identify the same person in different camera views. In recent years, many algorithms have been proposed that approach this problem by designing suitable feature representations for images of persons or by training appropriate distance metrics that learn to distinguish between images of different persons. Aggregating the results from multiple algorithms for person re-identification is a relatively less-explored area of research. In this paper, we formulate an algorithm that maps the ranking process in a person re-identification algorithm to a problem in graph theory. We then extend this formulation to allow for the use of results from multiple algorithms to make a consensus-based decision for the person re-identification problem. The algorithm is unsupervised and takes into account only the matching scores generated by multiple algorithms for creating a consensus of results. Further, we show how the graph theoretic problem can be solved by a two-step process. First, we obtain a rough estimate of the solution using a greedy algorithm. Then, we extend the construction of the proposed graph so that the problem can be efficiently solved by means of Ant Colony Optimization, a heuristic path-searching algorithm for complex graphs. While we present the algorithm in the context of person reidentification, it can potentially be applied to the general problem of ranking items based on a consensus of multiple sets of scores or metric values.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"83 1","pages":"1124-1133"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77220775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Stepwise Metric Promotion for Unsupervised Video Person Re-identification 无监督视频人物再识别的逐步度量推广
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.266
Zimo Liu, D. Wang, Huchuan Lu
The intensive annotation cost and the rich but unlabeled data contained in videos motivate us to propose an unsupervised video-based person re-identification (re-ID) method. We start from two assumptions: 1) different video tracklets typically contain different persons, given that the tracklets are taken at distinct places or with long intervals; 2) within each tracklet, the frames are mostly of the same person. Based on these assumptions, this paper propose a stepwise metric promotion approach to estimate the identities of training tracklets, which iterates between cross-camera tracklet association and feature learning. Specifically, We use each training tracklet as a query, and perform retrieval in the cross-camera training set. Our method is built on reciprocal nearest neighbor search and can eliminate the hard negative label matches, i.e., the cross-camera nearest neighbors of the false matches in the initial rank list. The tracklet that passes the reciprocal nearest neighbor check is considered to have the same ID with the query. Experimental results on the PRID 2011, ILIDS-VID, and MARS datasets show that the proposed method achieves very competitive re-ID accuracy compared with its supervised counterparts.
视频中大量的标注成本和丰富的未标记数据促使我们提出了一种基于无监督视频的人物再识别(re-ID)方法。我们从两个假设开始:1)不同的视频轨迹通常包含不同的人,因为轨迹是在不同的地方或间隔较长的时间拍摄的;2)在每个tracklet中,帧大多是同一个人。基于这些假设,本文提出了一种逐步度量提升方法来估计训练轨迹的身份,该方法在跨相机轨迹关联和特征学习之间迭代。具体来说,我们使用每个训练tracklet作为查询,并在跨相机训练集中执行检索。我们的方法建立在倒数最近邻搜索的基础上,可以消除硬负标签匹配,即初始排名列表中错误匹配的跨相机最近邻。通过倒数最近邻检查的tracklet被认为与查询具有相同的ID。在PRID 2011, ILIDS-VID和MARS数据集上的实验结果表明,与有监督的同类方法相比,该方法获得了极具竞争力的重id精度。
{"title":"Stepwise Metric Promotion for Unsupervised Video Person Re-identification","authors":"Zimo Liu, D. Wang, Huchuan Lu","doi":"10.1109/ICCV.2017.266","DOIUrl":"https://doi.org/10.1109/ICCV.2017.266","url":null,"abstract":"The intensive annotation cost and the rich but unlabeled data contained in videos motivate us to propose an unsupervised video-based person re-identification (re-ID) method. We start from two assumptions: 1) different video tracklets typically contain different persons, given that the tracklets are taken at distinct places or with long intervals; 2) within each tracklet, the frames are mostly of the same person. Based on these assumptions, this paper propose a stepwise metric promotion approach to estimate the identities of training tracklets, which iterates between cross-camera tracklet association and feature learning. Specifically, We use each training tracklet as a query, and perform retrieval in the cross-camera training set. Our method is built on reciprocal nearest neighbor search and can eliminate the hard negative label matches, i.e., the cross-camera nearest neighbors of the false matches in the initial rank list. The tracklet that passes the reciprocal nearest neighbor check is considered to have the same ID with the query. Experimental results on the PRID 2011, ILIDS-VID, and MARS datasets show that the proposed method achieves very competitive re-ID accuracy compared with its supervised counterparts.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"145 1","pages":"2448-2457"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78359174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Cascaded Feature Network for Semantic Segmentation of RGB-D Images 用于RGB-D图像语义分割的级联特征网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.147
Di Lin, Guangyong Chen, D. Cohen-Or, P. Heng, Hui Huang
Fully convolutional network (FCN) has been successfully applied in semantic segmentation of scenes represented with RGB images. Images augmented with depth channel provide more understanding of the geometric information of the scene in the image. The question is how to best exploit this additional information to improve the segmentation performance.,,In this paper, we present a neural network with multiple branches for segmenting RGB-D images. Our approach is to use the available depth to split the image into layers with common visual characteristic of objects/scenes, or common “scene-resolution”. We introduce context-aware receptive field (CaRF) which provides a better control on the relevant contextual information of the learned features. Equipped with CaRF, each branch of the network semantically segments relevant similar scene-resolution, leading to a more focused domain which is easier to learn. Furthermore, our network is cascaded with features from one branch augmenting the features of adjacent branch. We show that such cascading of features enriches the contextual information of each branch and enhances the overall performance. The accuracy that our network achieves outperforms the stateof-the-art methods on two public datasets.
全卷积网络(FCN)已成功应用于RGB图像场景的语义分割。深度通道增强的图像可以更好地理解图像中场景的几何信息。问题是如何最好地利用这些附加信息来提高分割性能。在本文中,我们提出了一个多分支的神经网络分割RGB-D图像。我们的方法是使用可用的深度将图像分成具有物体/场景共同视觉特征的层,或共同的“场景分辨率”。我们引入了上下文感知接受场(CaRF),它可以更好地控制学习特征的相关上下文信息。配备CaRF后,网络的每个分支在语义上分割相关的相似场景分辨率,从而形成更集中的领域,更容易学习。此外,我们的网络是级联的,从一个分支的特征增强相邻分支的特征。我们表明,这种层叠特征丰富了每个分支的上下文信息,提高了整体性能。我们的网络在两个公共数据集上实现的准确性优于最先进的方法。
{"title":"Cascaded Feature Network for Semantic Segmentation of RGB-D Images","authors":"Di Lin, Guangyong Chen, D. Cohen-Or, P. Heng, Hui Huang","doi":"10.1109/ICCV.2017.147","DOIUrl":"https://doi.org/10.1109/ICCV.2017.147","url":null,"abstract":"Fully convolutional network (FCN) has been successfully applied in semantic segmentation of scenes represented with RGB images. Images augmented with depth channel provide more understanding of the geometric information of the scene in the image. The question is how to best exploit this additional information to improve the segmentation performance.,,In this paper, we present a neural network with multiple branches for segmenting RGB-D images. Our approach is to use the available depth to split the image into layers with common visual characteristic of objects/scenes, or common “scene-resolution”. We introduce context-aware receptive field (CaRF) which provides a better control on the relevant contextual information of the learned features. Equipped with CaRF, each branch of the network semantically segments relevant similar scene-resolution, leading to a more focused domain which is easier to learn. Furthermore, our network is cascaded with features from one branch augmenting the features of adjacent branch. We show that such cascading of features enriches the contextual information of each branch and enhances the overall performance. The accuracy that our network achieves outperforms the stateof-the-art methods on two public datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"2014 1","pages":"1320-1328"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73294305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Depth and Image Restoration from Light Field in a Scattering Medium 散射介质中光场的深度和图像恢复
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.263
Jiandong Tian, Zak Murez, Tong Cui, Zhen Zhang, D. Kriegman, R. Ramamoorthi
Traditional imaging methods and computer vision algorithms are often ineffective when images are acquired in scattering media, such as underwater, fog, and biological tissue. Here, we explore the use of light field imaging and algorithms for image restoration and depth estimation that address the image degradation from the medium. Towards this end, we make the following three contributions. First, we present a new single image restoration algorithm which removes backscatter and attenuation from images better than existing methods do, and apply it to each view in the light field. Second, we combine a novel transmission based depth cue with existing correspondence and defocus cues to improve light field depth estimation. In densely scattering media, our transmission depth cue is critical for depth estimation since the images have low signal to noise ratios which significantly degrades the performance of the correspondence and defocus cues. Finally, we propose shearing and refocusing multiple views of the light field to recover a single image of higher quality than what is possible from a single view. We demonstrate the benefits of our method through extensive experimental results in a water tank.
传统的成像方法和计算机视觉算法在散射介质(如水下、雾和生物组织)中获取图像时往往无效。在这里,我们探索使用光场成像和算法的图像恢复和深度估计,以解决图像退化的介质。为此,我们作出以下三点贡献。首先,我们提出了一种新的单幅图像恢复算法,该算法比现有方法更好地消除了图像的后向散射和衰减,并将其应用于光场中的每个视图。其次,我们将一种新的基于传输的深度线索与现有的对应和离焦线索相结合,以改进光场深度估计。在密集散射介质中,我们的传输深度线索对于深度估计至关重要,因为图像的信噪比较低,这会显著降低对应和离焦线索的性能。最后,我们建议对光场的多个视图进行剪切和重新聚焦,以恢复比单个视图更高质量的单个图像。我们通过在水箱中的大量实验结果证明了我们的方法的好处。
{"title":"Depth and Image Restoration from Light Field in a Scattering Medium","authors":"Jiandong Tian, Zak Murez, Tong Cui, Zhen Zhang, D. Kriegman, R. Ramamoorthi","doi":"10.1109/ICCV.2017.263","DOIUrl":"https://doi.org/10.1109/ICCV.2017.263","url":null,"abstract":"Traditional imaging methods and computer vision algorithms are often ineffective when images are acquired in scattering media, such as underwater, fog, and biological tissue. Here, we explore the use of light field imaging and algorithms for image restoration and depth estimation that address the image degradation from the medium. Towards this end, we make the following three contributions. First, we present a new single image restoration algorithm which removes backscatter and attenuation from images better than existing methods do, and apply it to each view in the light field. Second, we combine a novel transmission based depth cue with existing correspondence and defocus cues to improve light field depth estimation. In densely scattering media, our transmission depth cue is critical for depth estimation since the images have low signal to noise ratios which significantly degrades the performance of the correspondence and defocus cues. Finally, we propose shearing and refocusing multiple views of the light field to recover a single image of higher quality than what is possible from a single view. We demonstrate the benefits of our method through extensive experimental results in a water tank.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"2420-2429"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80435744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Following Gaze in Video 视频中的注视
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.160
Adrià Recasens, Carl Vondrick, A. Khosla, A. Torralba
Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.
跟随视频中人物的目光是理解人和行为的重要信号。在本文中,我们提出了一种通过预测一个人(在视频中)正在看的位置来跟踪视频中凝视的方法,即使物体在不同的帧中。我们收集了一个新的数据集videaze,我们用它作为训练和评估模型的基准。给定一个帧中有一个人,我们的模型估计每个帧中凝视位置的密度以及这个人在那个特定帧中注视的概率。我们方法的一个关键方面是一个端到端模型,它可以在只使用凝视作为监督的情况下,共同估计视图之间的显著性、凝视姿势和几何关系。可视化表明,模型学习内部自动解决这些中间任务,而不需要额外的监督。实验表明,我们的方法比现有的方法更好地跟踪视频中的凝视,从而能够更丰富地理解视频中的人类活动。
{"title":"Following Gaze in Video","authors":"Adrià Recasens, Carl Vondrick, A. Khosla, A. Torralba","doi":"10.1109/ICCV.2017.160","DOIUrl":"https://doi.org/10.1109/ICCV.2017.160","url":null,"abstract":"Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"42 1","pages":"1444-1452"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80450246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Temporal Superpixels Based on Proximity-Weighted Patch Matching 基于邻近加权Patch匹配的时间超像素
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.390
Se-Ho Lee, Won-Dong Jang, Chang-Su Kim
A temporal superpixel algorithm based on proximity-weighted patch matching (TS-PPM) is proposed in this work. We develop the proximity-weighted patch matching (PPM), which estimates the motion vector of a superpixel robustly, by considering the patch matching distances of neighboring superpixels as well as the target superpixel. In each frame, we initialize superpixels by transferring the superpixel labels of the previous frame using PPM motion vectors. Then, we update the superpixel labels of boundary pixels, based on a cost function, composed of color, spatial, contour, and temporal consistency terms. Finally, we execute superpixel splitting, merging, and relabeling to regularize superpixel sizes and reduce incorrect labels. Experiments show that the proposed algorithm outperforms the state-of-the-art conventional algorithms significantly.
提出了一种基于邻近加权patch匹配(TS-PPM)的时间超像素算法。通过考虑相邻超像素和目标超像素的patch匹配距离,提出了一种邻域加权patch匹配(PPM)算法,该算法对超像素的运动向量进行鲁棒估计。在每一帧中,我们通过使用PPM运动向量传输前一帧的超像素标签来初始化超像素。然后,我们基于由颜色、空间、轮廓和时间一致性项组成的代价函数更新边界像素的超像素标签。最后,我们执行超像素分割、合并和重新标记,以规范超像素大小并减少不正确的标签。实验表明,该算法明显优于当前的传统算法。
{"title":"Temporal Superpixels Based on Proximity-Weighted Patch Matching","authors":"Se-Ho Lee, Won-Dong Jang, Chang-Su Kim","doi":"10.1109/ICCV.2017.390","DOIUrl":"https://doi.org/10.1109/ICCV.2017.390","url":null,"abstract":"A temporal superpixel algorithm based on proximity-weighted patch matching (TS-PPM) is proposed in this work. We develop the proximity-weighted patch matching (PPM), which estimates the motion vector of a superpixel robustly, by considering the patch matching distances of neighboring superpixels as well as the target superpixel. In each frame, we initialize superpixels by transferring the superpixel labels of the previous frame using PPM motion vectors. Then, we update the superpixel labels of boundary pixels, based on a cost function, composed of color, spatial, contour, and temporal consistency terms. Finally, we execute superpixel splitting, merging, and relabeling to regularize superpixel sizes and reduce incorrect labels. Experiments show that the proposed algorithm outperforms the state-of-the-art conventional algorithms significantly.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3630-3638"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82583263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1