首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation 单幅图像层分离的联合卷积分析与合成稀疏表示
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.189
Shuhang Gu, Deyu Meng, W. Zuo, Lei Zhang
Analysis sparse representation (ASR) and synthesis sparse representation (SSR) are two representative approaches for sparsity-based image modeling. An image is described mainly by the non-zero coefficients in SSR, while is mainly characterized by the indices of zeros in ASR. To exploit the complementary representation mechanisms of ASR and SSR, we integrate the two models and propose a joint convolutional analysis and synthesis (JCAS) sparse representation model. The convolutional implementation is adopted to more effectively exploit the image global information. In JCAS, a single image is decomposed into two layers, one is approximated by ASR to represent image large-scale structures, and the other by SSR to represent image fine-scale textures. The synthesis dictionary is adaptively learned in JCAS to describe the texture patterns for different single image layer separation tasks. We evaluate the proposed JCAS model on a variety of applications, including rain streak removal, high dynamic range image tone mapping, etc. The results show that our JCAS method outperforms state-of-the-arts in these applications in terms of both quantitative measure and visual perception quality.
分析稀疏表示(ASR)和综合稀疏表示(SSR)是基于稀疏性的图像建模的两种代表性方法。在SSR中,图像主要由非零系数来描述,而在ASR中,图像主要由零指数来表征。为了利用ASR和SSR的互补表示机制,我们将这两个模型整合在一起,提出了一个联合卷积分析与合成(JCAS)稀疏表示模型。采用卷积实现,更有效地利用图像全局信息。在JCAS中,将单幅图像分解为两层,一层由ASR近似表示图像的大尺度结构,另一层由SSR近似表示图像的细尺度纹理。JCAS自适应学习合成字典来描述不同单幅图像层分离任务的纹理模式。我们对所提出的JCAS模型进行了多种应用评估,包括去除雨纹、高动态范围图像色调映射等。结果表明,我们的JCAS方法在定量测量和视觉感知质量方面都优于目前最先进的应用。
{"title":"Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation","authors":"Shuhang Gu, Deyu Meng, W. Zuo, Lei Zhang","doi":"10.1109/ICCV.2017.189","DOIUrl":"https://doi.org/10.1109/ICCV.2017.189","url":null,"abstract":"Analysis sparse representation (ASR) and synthesis sparse representation (SSR) are two representative approaches for sparsity-based image modeling. An image is described mainly by the non-zero coefficients in SSR, while is mainly characterized by the indices of zeros in ASR. To exploit the complementary representation mechanisms of ASR and SSR, we integrate the two models and propose a joint convolutional analysis and synthesis (JCAS) sparse representation model. The convolutional implementation is adopted to more effectively exploit the image global information. In JCAS, a single image is decomposed into two layers, one is approximated by ASR to represent image large-scale structures, and the other by SSR to represent image fine-scale textures. The synthesis dictionary is adaptively learned in JCAS to describe the texture patterns for different single image layer separation tasks. We evaluate the proposed JCAS model on a variety of applications, including rain streak removal, high dynamic range image tone mapping, etc. The results show that our JCAS method outperforms state-of-the-arts in these applications in terms of both quantitative measure and visual perception quality.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"51 1","pages":"1717-1725"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76184818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
Self-Organized Text Detection with Minimal Post-processing via Border Learning 基于边界学习的最小后处理自组织文本检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.535
Yue Wu, P. Natarajan
In this paper we propose a new solution to the text detection problem via border learning. Specifically, we make four major contributions: 1) We analyze the insufficiencies of the classic non-text and text settings for text detection. 2) We introduce the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. 3) We collect and release a new text detection PPT dataset containing 10,692 images with non-text, border, and text annotations. 4) We develop a lightweight (only 0.28M parameters), fully convolutional network (FCN) to effectively learn borders in text images. The results of our extensive experiments show that the proposed solution achieves comparable performance, and often outperforms state-of-theart approaches on standard benchmarks–even though our solution only requires minimal post-processing to parse a bounding box from a detected text map, while others often require heavy post-processing.
本文提出了一种基于边界学习的文本检测方法。具体来说,我们做出了四个主要贡献:1)我们分析了经典的非文本和文本设置在文本检测中的不足。2)首次将边界类引入到文本检测问题中,验证了在文本边界的帮助下,解码过程大大简化。3)我们收集并发布了一个新的文本检测PPT数据集,包含10692张带有非文本、边框和文本注释的图片。4)我们开发了一个轻量级的(只有0.28M个参数),全卷积网络(FCN)来有效地学习文本图像中的边界。我们广泛的实验结果表明,提出的解决方案实现了相当的性能,并且在标准基准上通常优于当前的方法——尽管我们的解决方案只需要最少的后处理,就可以从检测到的文本映射中解析边界框,而其他解决方案通常需要大量的后处理。
{"title":"Self-Organized Text Detection with Minimal Post-processing via Border Learning","authors":"Yue Wu, P. Natarajan","doi":"10.1109/ICCV.2017.535","DOIUrl":"https://doi.org/10.1109/ICCV.2017.535","url":null,"abstract":"In this paper we propose a new solution to the text detection problem via border learning. Specifically, we make four major contributions: 1) We analyze the insufficiencies of the classic non-text and text settings for text detection. 2) We introduce the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. 3) We collect and release a new text detection PPT dataset containing 10,692 images with non-text, border, and text annotations. 4) We develop a lightweight (only 0.28M parameters), fully convolutional network (FCN) to effectively learn borders in text images. The results of our extensive experiments show that the proposed solution achieves comparable performance, and often outperforms state-of-theart approaches on standard benchmarks–even though our solution only requires minimal post-processing to parse a bounding box from a detected text map, while others often require heavy post-processing.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"2 1","pages":"5010-5019"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82794502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Modeling Urban Scenes from Pointclouds 从Pointclouds建模城市场景
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.414
William Nguatem, H. Mayer
We present a method for Modeling Urban Scenes from Pointclouds (MUSP). In contrast to existing approaches, MUSP is robust, scalable and provides a more complete description by not making a Manhattan-World assumption and modeling both buildings (with polyhedra) as well as the non-planar ground (using NURBS). First, we segment the scene into consistent patches using a divide-and-conquer based algorithm within a nonparametric Bayesian framework (stick-breaking construction). These patches often correspond to meaningful structures, such as the ground, facades, roofs and roof superstructures. We use polygon sweeping to fit predefined templates for buildings, and for the ground, a NURBS surface is fit and uniformly tessellated. Finally, we apply boolean operations to the polygons for buildings, buildings parts and the tesselated ground to clip unnecessary geometry (e.g., facades protrusions below the non-planar ground), leading to the final model. The explicit Bayesian formulation of scene segmentation makes our approach suitable for challenging datasets with varying amounts of noise, outliers, and point density. We demonstrate the robustness of MUSP on 3D pointclouds from image matching as well as LiDAR.
我们提出了一种基于点云(MUSP)的城市场景建模方法。与现有的方法相比,MUSP是健壮的,可扩展的,并且通过不做曼哈顿世界的假设和对建筑物(多面体)和非平面地面(使用NURBS)进行建模,提供了更完整的描述。首先,我们在非参数贝叶斯框架内使用分而治之的算法将场景分割成一致的patch(棍子断裂构造)。这些斑块通常对应于有意义的结构,如地面、立面、屋顶和屋顶上层建筑。我们使用多边形扫描来匹配建筑物的预定义模板,对于地面,NURBS表面被匹配并均匀地镶嵌。最后,我们对建筑物、建筑物部件和镶嵌地面的多边形应用布尔运算,以剪切不必要的几何形状(例如,非平面地面下方的外立面突出物),从而生成最终模型。场景分割的显式贝叶斯公式使我们的方法适用于具有不同数量的噪声、异常值和点密度的具有挑战性的数据集。我们从图像匹配和激光雷达上证明了MUSP对3D点云的鲁棒性。
{"title":"Modeling Urban Scenes from Pointclouds","authors":"William Nguatem, H. Mayer","doi":"10.1109/ICCV.2017.414","DOIUrl":"https://doi.org/10.1109/ICCV.2017.414","url":null,"abstract":"We present a method for Modeling Urban Scenes from Pointclouds (MUSP). In contrast to existing approaches, MUSP is robust, scalable and provides a more complete description by not making a Manhattan-World assumption and modeling both buildings (with polyhedra) as well as the non-planar ground (using NURBS). First, we segment the scene into consistent patches using a divide-and-conquer based algorithm within a nonparametric Bayesian framework (stick-breaking construction). These patches often correspond to meaningful structures, such as the ground, facades, roofs and roof superstructures. We use polygon sweeping to fit predefined templates for buildings, and for the ground, a NURBS surface is fit and uniformly tessellated. Finally, we apply boolean operations to the polygons for buildings, buildings parts and the tesselated ground to clip unnecessary geometry (e.g., facades protrusions below the non-planar ground), leading to the final model. The explicit Bayesian formulation of scene segmentation makes our approach suitable for challenging datasets with varying amounts of noise, outliers, and point density. We demonstrate the robustness of MUSP on 3D pointclouds from image matching as well as LiDAR.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"3 1","pages":"3857-3866"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91545183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow 基于互补cnn和邻域可逆流的主视频目标分割
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.158
Jia Li, Anlin Zheng, Xiaowu Chen, Bin Zhou
This paper proposes a novel approach for segmenting primary video objects by using Complementary Convolutional Neural Networks (CCNN) and neighborhood reversible flow. The proposed approach first pre-trains CCNN on massive images with manually annotated salient objects in an end-to-end manner, and the trained CCNN has two separate branches that simultaneously handle two complementary tasks, i.e., foregroundness and backgroundness estimation. By applying CCNN on each video frame, the spatial foregroundness and backgroundness maps can be initialized, which are then propagated between various frames so as to segment primary video objects and suppress distractors. To enforce efficient temporal propagation, we divide each frame into superpixels and construct neighborhood reversible flow that reflects the most reliable temporal correspondences between superpixels in far-away frames. Within such flow, the initialized foregroundness and backgroundness can be efficiently and accurately propagated along the temporal axis so that primary video objects gradually pop-out and distractors are well suppressed. Extensive experimental results on three video datasets show that the proposed approach achieves impressive performance in comparisons with 18 state-of-the-art models.
提出了一种基于互补卷积神经网络(CCNN)和邻域可逆流的主视频目标分割新方法。该方法首先以端到端的方式在大量图像上对CCNN进行预训练,训练后的CCNN有两个独立的分支,同时处理两个互补的任务,即前景估计和背景估计。通过在每一帧视频上应用CCNN,可以初始化空间前景和背景映射,然后在各帧之间传播,从而分割主视频对象并抑制干扰物。为了加强有效的时间传播,我们将每帧划分为超像素,并构建反映远处帧中超像素之间最可靠的时间对应的邻域可逆流。在这样的流中,初始化的前景和背景可以沿着时间轴高效、准确地传播,使得初级视频对象逐渐弹出,干扰物被很好地抑制。在三个视频数据集上的大量实验结果表明,与18个最先进的模型相比,该方法取得了令人印象深刻的性能。
{"title":"Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow","authors":"Jia Li, Anlin Zheng, Xiaowu Chen, Bin Zhou","doi":"10.1109/ICCV.2017.158","DOIUrl":"https://doi.org/10.1109/ICCV.2017.158","url":null,"abstract":"This paper proposes a novel approach for segmenting primary video objects by using Complementary Convolutional Neural Networks (CCNN) and neighborhood reversible flow. The proposed approach first pre-trains CCNN on massive images with manually annotated salient objects in an end-to-end manner, and the trained CCNN has two separate branches that simultaneously handle two complementary tasks, i.e., foregroundness and backgroundness estimation. By applying CCNN on each video frame, the spatial foregroundness and backgroundness maps can be initialized, which are then propagated between various frames so as to segment primary video objects and suppress distractors. To enforce efficient temporal propagation, we divide each frame into superpixels and construct neighborhood reversible flow that reflects the most reliable temporal correspondences between superpixels in far-away frames. Within such flow, the initialized foregroundness and backgroundness can be efficiently and accurately propagated along the temporal axis so that primary video objects gradually pop-out and distractors are well suppressed. Extensive experimental results on three video datasets show that the proposed approach achieves impressive performance in comparisons with 18 state-of-the-art models.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"96 17","pages":"1426-1434"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91406898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multi-view Non-rigid Refinement and Normal Selection for High Quality 3D Reconstruction 高质量三维重建的多视图非刚性细化和法线选择
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.261
Sk. Mohammadul Haque, V. Govindu
In recent years, there have been a variety of proposals for high quality 3D reconstruction by fusion of depth and normal maps that contain good low and high frequency information respectively. Typically, these methods create an initial mesh representation of the complete object or scene being scanned. Subsequently, normal estimates are assigned to each mesh vertex and a mesh-normal fusion step is carried out. In this paper, we present a complete pipeline for such depth-normal fusion. The key innovations in our pipeline are twofold. Firstly, we introduce a global multi-view non-rigid refinement step that corrects for the non-rigid misalignment present in the depth and normal maps. We demonstrate that such a correction is crucial for preserving fine-scale 3D features in the final reconstruction. Secondly, despite adequate care, the averaging of multiple normals invariably results in blurring of3D detail. To mitigate this problem, we propose an approach that selects one out of many available normals. Our global cost for normal selection incorporates a variety of desirable properties and can be efficiently solved using graph cuts. We demonstrate the efficacy of our approach in generating high quality 3D reconstructions of both synthetic and real 3D models and compare with existing methods in the literature.
近年来,人们提出了多种融合深度图和法线图的高质量三维重建方法,其中深度图和法线图分别包含良好的低频和高频信息。通常,这些方法创建被扫描的完整对象或场景的初始网格表示。随后,对每个网格顶点进行法线估计,并进行网格法线融合。在本文中,我们提出了一个完整的管道,用于这种深度法向融合。我们产品线中的关键创新有两个方面。首先,我们引入了一个全局多视图非刚性细化步骤,用于校正深度和法线图中存在的非刚性错位。我们证明了这种校正对于在最终重建中保留精细3D特征至关重要。其次,尽管足够小心,多个法线的平均总是导致模糊的3d细节。为了缓解这个问题,我们提出了一种方法,从许多可用的法线中选择一个。我们的正常选择的全局成本包含了各种理想的属性,并且可以使用图切割有效地解决。我们证明了我们的方法在生成合成和真实3D模型的高质量3D重建方面的有效性,并与文献中的现有方法进行了比较。
{"title":"Multi-view Non-rigid Refinement and Normal Selection for High Quality 3D Reconstruction","authors":"Sk. Mohammadul Haque, V. Govindu","doi":"10.1109/ICCV.2017.261","DOIUrl":"https://doi.org/10.1109/ICCV.2017.261","url":null,"abstract":"In recent years, there have been a variety of proposals for high quality 3D reconstruction by fusion of depth and normal maps that contain good low and high frequency information respectively. Typically, these methods create an initial mesh representation of the complete object or scene being scanned. Subsequently, normal estimates are assigned to each mesh vertex and a mesh-normal fusion step is carried out. In this paper, we present a complete pipeline for such depth-normal fusion. The key innovations in our pipeline are twofold. Firstly, we introduce a global multi-view non-rigid refinement step that corrects for the non-rigid misalignment present in the depth and normal maps. We demonstrate that such a correction is crucial for preserving fine-scale 3D features in the final reconstruction. Secondly, despite adequate care, the averaging of multiple normals invariably results in blurring of3D detail. To mitigate this problem, we propose an approach that selects one out of many available normals. Our global cost for normal selection incorporates a variety of desirable properties and can be efficiently solved using graph cuts. We demonstrate the efficacy of our approach in generating high quality 3D reconstructions of both synthetic and real 3D models and compare with existing methods in the literature.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"78 1","pages":"2401-2409"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83071919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning-Based Cloth Material Recovery from Video 基于学习的布料材料回收视频
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.470
Shan Yang, Junbang Liang, M. Lin
Image and video understanding enables better reconstruction of the physical world. Existing methods focus largely on geometry and visual appearance of the reconstructed scene. In this paper, we extend the frontier in image understanding and present a method to recover the material properties of cloth from a video. Previous cloth material recovery methods often require markers or complex experimental set-up to acquire physical properties, or are limited to certain types of images or videos. Our approach takes advantages of the appearance changes of the moving cloth to infer its physical properties. To extract information about the cloth, our method characterizes both the motion and the visual appearance of the cloth geometry. We apply the Convolutional Neural Network (CNN) and the Long Short Term Memory (LSTM) neural network to material recovery of cloth from videos. We also exploit simulated data to help statistical learning of mapping between the visual appearance and material type of the cloth. The effectiveness of our method is demonstrated via validation using both the simulated datasets and the real-life recorded videos.
图像和视频的理解可以更好地重建物理世界。现有的方法主要关注重建场景的几何形状和视觉外观。在本文中,我们扩展了图像理解的前沿,提出了一种从视频中恢复布料材料特性的方法。以前的布料材料回收方法通常需要标记或复杂的实验装置来获得物理性质,或者仅限于某些类型的图像或视频。我们的方法利用运动布料的外观变化来推断其物理性质。为了提取布料的信息,我们的方法描述了布料几何形状的运动和视觉外观。我们将卷积神经网络(CNN)和长短期记忆(LSTM)神经网络应用于视频布料的材料恢复。我们还利用模拟数据来帮助统计学习视觉外观和布料材料类型之间的映射。通过使用模拟数据集和实际录制的视频进行验证,证明了我们方法的有效性。
{"title":"Learning-Based Cloth Material Recovery from Video","authors":"Shan Yang, Junbang Liang, M. Lin","doi":"10.1109/ICCV.2017.470","DOIUrl":"https://doi.org/10.1109/ICCV.2017.470","url":null,"abstract":"Image and video understanding enables better reconstruction of the physical world. Existing methods focus largely on geometry and visual appearance of the reconstructed scene. In this paper, we extend the frontier in image understanding and present a method to recover the material properties of cloth from a video. Previous cloth material recovery methods often require markers or complex experimental set-up to acquire physical properties, or are limited to certain types of images or videos. Our approach takes advantages of the appearance changes of the moving cloth to infer its physical properties. To extract information about the cloth, our method characterizes both the motion and the visual appearance of the cloth geometry. We apply the Convolutional Neural Network (CNN) and the Long Short Term Memory (LSTM) neural network to material recovery of cloth from videos. We also exploit simulated data to help statistical learning of mapping between the visual appearance and material type of the cloth. The effectiveness of our method is demonstrated via validation using both the simulated datasets and the real-life recorded videos.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"4393-4403"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88773290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Robust Object Tracking Based on Temporal and Spatial Deep Networks 基于时空深度网络的鲁棒目标跟踪
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.130
Zhu Teng, Junliang Xing, Qiang Wang, Congyan Lang, Songhe Feng, Yi Jin
Recently deep neural networks have been widely employed to deal with the visual tracking problem. In this work, we present a new deep architecture which incorporates the temporal and spatial information to boost the tracking performance. Our deep architecture contains three networks, a Feature Net, a Temporal Net, and a Spatial Net. The Feature Net extracts general feature representations of the target. With these feature representations, the Temporal Net encodes the trajectory of the target and directly learns temporal correspondences to estimate the object state from a global perspective. Based on the learning results of the Temporal Net, the Spatial Net further refines the object tracking state using local spatial object information. Extensive experiments on four of the largest tracking benchmarks, including VOT2014, VOT2016, OTB50, and OTB100, demonstrate competing performance of the proposed tracker over a number of state-of-the-art algorithms.
近年来,深度神经网络被广泛应用于处理视觉跟踪问题。在这项工作中,我们提出了一种新的深度架构,它结合了时间和空间信息来提高跟踪性能。我们的深层架构包含三个网络,一个特征网,一个时间网和一个空间网。特征网提取目标的一般特征表示。利用这些特征表示,时间网络对目标的轨迹进行编码,并直接学习时间对应,从全局角度估计目标的状态。在时态网学习结果的基础上,空间网利用局部空间目标信息进一步细化目标跟踪状态。在四个最大的跟踪基准(包括VOT2014、VOT2016、OTB50和OTB100)上进行了大量实验,证明了所提出的跟踪器在许多最先进算法上的竞争性能。
{"title":"Robust Object Tracking Based on Temporal and Spatial Deep Networks","authors":"Zhu Teng, Junliang Xing, Qiang Wang, Congyan Lang, Songhe Feng, Yi Jin","doi":"10.1109/ICCV.2017.130","DOIUrl":"https://doi.org/10.1109/ICCV.2017.130","url":null,"abstract":"Recently deep neural networks have been widely employed to deal with the visual tracking problem. In this work, we present a new deep architecture which incorporates the temporal and spatial information to boost the tracking performance. Our deep architecture contains three networks, a Feature Net, a Temporal Net, and a Spatial Net. The Feature Net extracts general feature representations of the target. With these feature representations, the Temporal Net encodes the trajectory of the target and directly learns temporal correspondences to estimate the object state from a global perspective. Based on the learning results of the Temporal Net, the Spatial Net further refines the object tracking state using local spatial object information. Extensive experiments on four of the largest tracking benchmarks, including VOT2014, VOT2016, OTB50, and OTB100, demonstrate competing performance of the proposed tracker over a number of state-of-the-art algorithms.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"38 1","pages":"1153-1162"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91330576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Raster-to-Vector: Revisiting Floorplan Transformation 栅格到矢量:重访平面图转换
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.241
Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa
This paper addresses the problem of converting a rasterized floorplan image into a vector-graphics representation. Unlike existing approaches that rely on a sequence of lowlevel image processing heuristics, we adopt a learning-based approach. A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e.g., wall corners or door end-points). Integer programming is then formulated to aggregate junctions into a set of simple primitives (e.g., wall lines, door lines, or icon boxes) to produce a vectorized floorplan, while ensuring a topologically and geometrically consistent result. Our algorithm significantly outperforms existing methods and achieves around 90% precision and recall, getting to the range of production-ready performance. The vector representation allows 3D model popup for better indoor scene visualization, direct model manipulation for architectural remodeling, and further computational applications such as data analysis. Our system is efficient: we have converted hundred thousand production-level floorplan images into the vector representation and generated 3D popup models.
本文解决了将栅格化的平面图图像转换为矢量图形表示的问题。与现有的依赖于一系列低级图像处理启发式的方法不同,我们采用了基于学习的方法。神经结构首先将栅格化图像转换为一组表示低级几何和语义信息的连接(例如,墙角或门的端点)。然后制定整数规划,将连接点聚合成一组简单的原语(例如,墙线,门线或图标框),以产生矢量化的平面图,同时确保拓扑和几何上的一致结果。我们的算法明显优于现有的方法,达到了90%左右的精度和召回率,达到了生产就绪的性能范围。矢量表示允许3D模型弹出以实现更好的室内场景可视化,直接对建筑重塑进行模型操作,以及进一步的计算应用,如数据分析。我们的系统是高效的:我们已经将十万张生产级平面图转换为矢量表示,并生成3D弹出模型。
{"title":"Raster-to-Vector: Revisiting Floorplan Transformation","authors":"Chen Liu, Jiajun Wu, Pushmeet Kohli, Yasutaka Furukawa","doi":"10.1109/ICCV.2017.241","DOIUrl":"https://doi.org/10.1109/ICCV.2017.241","url":null,"abstract":"This paper addresses the problem of converting a rasterized floorplan image into a vector-graphics representation. Unlike existing approaches that rely on a sequence of lowlevel image processing heuristics, we adopt a learning-based approach. A neural architecture first transforms a rasterized image to a set of junctions that represent low-level geometric and semantic information (e.g., wall corners or door end-points). Integer programming is then formulated to aggregate junctions into a set of simple primitives (e.g., wall lines, door lines, or icon boxes) to produce a vectorized floorplan, while ensuring a topologically and geometrically consistent result. Our algorithm significantly outperforms existing methods and achieves around 90% precision and recall, getting to the range of production-ready performance. The vector representation allows 3D model popup for better indoor scene visualization, direct model manipulation for architectural remodeling, and further computational applications such as data analysis. Our system is efficient: we have converted hundred thousand production-level floorplan images into the vector representation and generated 3D popup models.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"2214-2222"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90461539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
Multimodal Gaussian Process Latent Variable Models with Harmonization 具有协调的多模态高斯过程潜变量模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.538
Guoli Song, Shuhui Wang, Qingming Huang, Q. Tian
In this work, we address multimodal learning problem with Gaussian process latent variable models (GPLVMs) and their application to cross-modal retrieval. Existing GPLVM based studies generally impose individual priors over the model parameters and ignore the intrinsic relations among these parameters. Considering the strong complementarity between modalities, we propose a novel joint prior over the parameters for multimodal GPLVMs to propagate multimodal information in both kernel hyperparameter spaces and latent space. The joint prior is formulated as a harmonization constraint on the model parameters, which enforces the agreement among the modality-specific GP kernels and the similarity in the latent space. We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs. The proposed methods are evaluated on three widely used multimodal datasets for cross-modal retrieval. Experimental results show that the harmonization mechanism is beneficial to the GPLVM algorithms for learning non-linear correlation among heterogeneous modalities.
在这项工作中,我们解决了高斯过程潜变量模型(gplvm)的多模态学习问题及其在跨模态检索中的应用。现有的基于GPLVM的研究通常对模型参数施加单独的先验,而忽略了这些参数之间的内在关系。考虑到模态之间的强互补性,我们提出了一种新的多模态gplvm的联合先验参数,以在核超参数空间和潜在空间中传播多模态信息。联合先验被表述为模型参数的协调约束,它强制了特定模态GP核之间的一致性和潜在空间中的相似性。我们将协调机制融入到多模态gplvm的学习过程中。在三个广泛使用的多模态数据集上对所提出的方法进行了评估。实验结果表明,该协调机制有利于GPLVM算法学习异构模态之间的非线性相关性。
{"title":"Multimodal Gaussian Process Latent Variable Models with Harmonization","authors":"Guoli Song, Shuhui Wang, Qingming Huang, Q. Tian","doi":"10.1109/ICCV.2017.538","DOIUrl":"https://doi.org/10.1109/ICCV.2017.538","url":null,"abstract":"In this work, we address multimodal learning problem with Gaussian process latent variable models (GPLVMs) and their application to cross-modal retrieval. Existing GPLVM based studies generally impose individual priors over the model parameters and ignore the intrinsic relations among these parameters. Considering the strong complementarity between modalities, we propose a novel joint prior over the parameters for multimodal GPLVMs to propagate multimodal information in both kernel hyperparameter spaces and latent space. The joint prior is formulated as a harmonization constraint on the model parameters, which enforces the agreement among the modality-specific GP kernels and the similarity in the latent space. We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs. The proposed methods are evaluated on three widely used multimodal datasets for cross-modal retrieval. Experimental results show that the harmonization mechanism is beneficial to the GPLVM algorithms for learning non-linear correlation among heterogeneous modalities.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"37 1","pages":"5039-5047"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76644501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Cross-Modal Deep Variational Hashing 跨模态深度变分哈希
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.439
Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, Jie Zhou
In this paper, we propose a cross-modal deep variational hashing (CMDVH) method for cross-modality multimedia retrieval. Unlike existing cross-modal hashing methods which learn a single pair of projections to map each example as a binary vector, we design a couple of deep neural network to learn non-linear transformations from image-text input pairs, so that unified binary codes can be obtained. We then design the modality-specific neural networks in a probabilistic manner where we model a latent variable as close as possible from the inferred binary codes, which is approximated by a posterior distribution regularized by a known prior. Experimental results on three benchmark datasets show the efficacy of the proposed approach.
本文提出了一种跨模态深度变分哈希(CMDVH)方法用于跨模态多媒体检索。与现有的跨模态哈希方法不同,我们设计了一对深度神经网络来学习图像-文本输入对的非线性变换,从而获得统一的二进制代码。然后,我们以概率方式设计模态特定的神经网络,其中我们根据推断的二进制代码尽可能接近地建模潜在变量,这是由已知先验正则化的后验分布近似的。在三个基准数据集上的实验结果表明了该方法的有效性。
{"title":"Cross-Modal Deep Variational Hashing","authors":"Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, Jie Zhou","doi":"10.1109/ICCV.2017.439","DOIUrl":"https://doi.org/10.1109/ICCV.2017.439","url":null,"abstract":"In this paper, we propose a cross-modal deep variational hashing (CMDVH) method for cross-modality multimedia retrieval. Unlike existing cross-modal hashing methods which learn a single pair of projections to map each example as a binary vector, we design a couple of deep neural network to learn non-linear transformations from image-text input pairs, so that unified binary codes can be obtained. We then design the modality-specific neural networks in a probabilistic manner where we model a latent variable as close as possible from the inferred binary codes, which is approximated by a posterior distribution regularized by a known prior. Experimental results on three benchmark datasets show the efficacy of the proposed approach.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"4097-4105"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75080068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1