首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Coordinating Multiple Disparity Proposals for Stereo Computation 协调多视差立体计算方案
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.436
Ang Li, Dapeng Chen, Yuanliu Liu, Zejian Yuan
While great progress has been made in stereo computation over the last decades, large textureless regions remain challenging. Segment-based methods can tackle this problem properly, but their performances are sensitive to the segmentation results. In this paper, we alleviate the sensitivity by generating multiple proposals on absolute and relative disparities from multi-segmentations. These proposals supply rich descriptions of surface structures. Especially, the relative disparity between distant pixels can encode the large structure, which is critical to handle the large textureless regions. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes, so the long-range connections are better preserved. In the experiments, we carefully analyzed the effectiveness of the major components. Results on the 2014 Middlebury and KITTI 2015 stereo benchmark show that our method is comparable to state-of-the-art.
虽然在过去的几十年里,立体计算取得了很大的进步,但大型无纹理区域仍然具有挑战性。基于分割的方法可以很好地解决这一问题,但其性能对分割结果很敏感。在本文中,我们通过对多分段的绝对差异和相对差异提出多个建议来缓解这种敏感性。这些建议提供了丰富的表面结构描述。特别是,远距离像素之间的相对差异可以编码大的结构,这对于处理大的无纹理区域至关重要。在MRF模型中,通过逐点竞争和成对协作来协调提案。在推理过程中,在不同的方向上以不同的步长进行动态规划,从而更好地保留了远程连接。在实验中,我们仔细分析了主要成分的有效性。2014年Middlebury和2015年KITTI立体基准测试的结果表明,我们的方法与最先进的方法相当。
{"title":"Coordinating Multiple Disparity Proposals for Stereo Computation","authors":"Ang Li, Dapeng Chen, Yuanliu Liu, Zejian Yuan","doi":"10.1109/CVPR.2016.436","DOIUrl":"https://doi.org/10.1109/CVPR.2016.436","url":null,"abstract":"While great progress has been made in stereo computation over the last decades, large textureless regions remain challenging. Segment-based methods can tackle this problem properly, but their performances are sensitive to the segmentation results. In this paper, we alleviate the sensitivity by generating multiple proposals on absolute and relative disparities from multi-segmentations. These proposals supply rich descriptions of surface structures. Especially, the relative disparity between distant pixels can encode the large structure, which is critical to handle the large textureless regions. The proposals are coordinated by point-wise competition and pairwise collaboration within a MRF model. During inference, a dynamic programming is performed in different directions with various step sizes, so the long-range connections are better preserved. In the experiments, we carefully analyzed the effectiveness of the major components. Results on the 2014 Middlebury and KITTI 2015 stereo benchmark show that our method is comparable to state-of-the-art.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"310 1","pages":"4022-4030"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76454874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
MDL-CW: A Multimodal Deep Learning Framework with CrossWeights 基于交叉权重的多模态深度学习框架
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.285
Sarah Rastegar, M. Baghshah, H. Rabiee, Seyed Mohsen Shojaee
Deep learning has received much attention as of the most powerful approaches for multimodal representation learning in recent years. An ideal model for multimodal data can reason about missing modalities using the available ones, and usually provides more information when multiple modalities are being considered. All the previous deep models contain separate modality-specific networks and find a shared representation on top of those networks. Therefore, they only consider high level interactions between modalities to find a joint representation for them. In this paper, we propose a multimodal deep learning framework (MDLCW) that exploits the cross weights between representation of modalities, and try to gradually learn interactions of the modalities in a deep network manner (from low to high level interactions). Moreover, we theoretically show that considering these interactions provide more intra-modality information, and introduce a multi-stage pre-training method that is based on the properties of multi-modal data. In the proposed framework, as opposed to the existing deep methods for multi-modal data, we try to reconstruct the representation of each modality at a given level, with representation of other modalities in the previous layer. Extensive experimental results show that the proposed model outperforms state-of-the-art information retrieval methods for both image and text queries on the PASCAL-sentence and SUN-Attribute databases.
近年来,深度学习作为多模态表示学习中最强大的方法受到了广泛的关注。多模态数据的理想模型可以使用可用的模态来推断缺失的模态,并且通常在考虑多模态时提供更多信息。所有以前的深度模型都包含单独的特定于模态的网络,并在这些网络之上找到一个共享的表示。因此,他们只考虑模式之间的高层交互,以找到它们的联合表示。在本文中,我们提出了一个多模态深度学习框架(MDLCW),该框架利用模态表示之间的交叉权重,并试图以深度网络的方式逐渐学习模态之间的相互作用(从低级交互到高级交互)。此外,我们从理论上证明了考虑这些相互作用可以提供更多的模态内信息,并引入了一种基于多模态数据特性的多阶段预训练方法。在提出的框架中,与现有的多模态数据的深度方法相反,我们试图在给定的层次上重建每个模态的表示,并在前一层中表示其他模态。大量的实验结果表明,该模型在PASCAL-sentence和SUN-Attribute数据库上的图像和文本查询都优于最先进的信息检索方法。
{"title":"MDL-CW: A Multimodal Deep Learning Framework with CrossWeights","authors":"Sarah Rastegar, M. Baghshah, H. Rabiee, Seyed Mohsen Shojaee","doi":"10.1109/CVPR.2016.285","DOIUrl":"https://doi.org/10.1109/CVPR.2016.285","url":null,"abstract":"Deep learning has received much attention as of the most powerful approaches for multimodal representation learning in recent years. An ideal model for multimodal data can reason about missing modalities using the available ones, and usually provides more information when multiple modalities are being considered. All the previous deep models contain separate modality-specific networks and find a shared representation on top of those networks. Therefore, they only consider high level interactions between modalities to find a joint representation for them. In this paper, we propose a multimodal deep learning framework (MDLCW) that exploits the cross weights between representation of modalities, and try to gradually learn interactions of the modalities in a deep network manner (from low to high level interactions). Moreover, we theoretically show that considering these interactions provide more intra-modality information, and introduce a multi-stage pre-training method that is based on the properties of multi-modal data. In the proposed framework, as opposed to the existing deep methods for multi-modal data, we try to reconstruct the representation of each modality at a given level, with representation of other modalities in the previous layer. Extensive experimental results show that the proposed model outperforms state-of-the-art information retrieval methods for both image and text queries on the PASCAL-sentence and SUN-Attribute databases.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"48 1","pages":"2601-2609"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79252352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
A Task-Oriented Approach for Cost-Sensitive Recognition 面向任务的成本敏感识别方法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.242
Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi
With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task-oriented recognition method that is based on a combination of linguistic semantics and visual cues. Our task-oriented framework is able to generalize to unseen tasks for which there is no training data and outperforms state-of-the-art cost-based recognition baselines on our new task-based dataset.
随着最近在视觉识别方面的进展,我们已经开始看到与视觉相关的实际应用激增。与一般的场景理解不同,这些应用是面向任务的,需要来自视觉数据的特定信息。考虑到当前新感官设备、特征设计、特征学习方法和算法的增长,特征和模型空间中的搜索变得组合。本文提出了一种基于语言语义和视觉线索相结合的成本敏感任务导向识别方法。我们的面向任务的框架能够推广到没有训练数据的看不见的任务,并且在我们新的基于任务的数据集上优于最先进的基于成本的识别基线。
{"title":"A Task-Oriented Approach for Cost-Sensitive Recognition","authors":"Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi","doi":"10.1109/CVPR.2016.242","DOIUrl":"https://doi.org/10.1109/CVPR.2016.242","url":null,"abstract":"With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task-oriented recognition method that is based on a combination of linguistic semantics and visual cues. Our task-oriented framework is able to generalize to unseen tasks for which there is no training data and outperforms state-of-the-art cost-based recognition baselines on our new task-based dataset.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"2203-2211"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76953810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation 同时估计反射率和照度的加权变分模型
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.304
Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, Xinghao Ding
We propose a weighted variational model to estimate both the reflectance and the illumination from an observed image. We show that, though it is widely adopted for ease of modeling, the log-transformed image for this task is not ideal. Based on the previous investigation of the logarithmic transformation, a new weighted variational model is proposed for better prior representation, which is imposed in the regularization terms. Different from conventional variational models, the proposed model can preserve the estimated reflectance with more details. Moreover, the proposed model can suppress noise to some extent. An alternating minimization scheme is adopted to solve the proposed model. Experimental results demonstrate the effectiveness of the proposed model with its algorithm. Compared with other variational methods, the proposed method yields comparable or better results on both subjective and objective assessments.
我们提出了一个加权变分模型来估计反射率和照度从观测图像。我们表明,尽管为了便于建模而广泛采用对数变换图像,但该任务的对数变换图像并不理想。在对对数变换研究的基础上,提出了一种新的加权变分模型,用于正则化项中更好的先验表示。与传统的变分模型不同,该模型可以更详细地保留估计的反射率。此外,该模型还能在一定程度上抑制噪声。采用交替最小化方案求解该模型。实验结果证明了该模型及其算法的有效性。与其他变分方法相比,所提出的方法在主观和客观评估上都产生了相当或更好的结果。
{"title":"A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation","authors":"Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, Xinghao Ding","doi":"10.1109/CVPR.2016.304","DOIUrl":"https://doi.org/10.1109/CVPR.2016.304","url":null,"abstract":"We propose a weighted variational model to estimate both the reflectance and the illumination from an observed image. We show that, though it is widely adopted for ease of modeling, the log-transformed image for this task is not ideal. Based on the previous investigation of the logarithmic transformation, a new weighted variational model is proposed for better prior representation, which is imposed in the regularization terms. Different from conventional variational models, the proposed model can preserve the estimated reflectance with more details. Moreover, the proposed model can suppress noise to some extent. An alternating minimization scheme is adopted to solve the proposed model. Experimental results demonstrate the effectiveness of the proposed model with its algorithm. Compared with other variational methods, the proposed method yields comparable or better results on both subjective and objective assessments.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"115 1","pages":"2782-2790"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73417073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 642
Face2Face: Real-Time Face Capture and Reenactment of RGB Videos Face2Face:实时人脸捕捉和再现RGB视频
Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner
We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time.
我们提出了一种用于单目目标视频序列(例如Youtube视频)的实时面部再现的新方法。源序列也是单目视频流,用商品网络摄像头实时捕获。我们的目标是通过源演员将目标视频的面部表情动画化,并以逼真的方式重新渲染被操纵的输出视频。为此,我们首先通过基于非刚性模型的捆绑解决了从单目视频中恢复面部身份的约束不足问题。在运行时,我们使用密集的光度一致性测量来跟踪源视频和目标视频的面部表情。然后通过在源和目标之间快速有效的变形传递来实现再现。从目标序列中检索与重新定位表达最匹配的口腔内部,并扭曲以产生准确的匹配。最后,我们令人信服地在相应的视频流上重新渲染合成的目标面,使其与现实世界的照明无缝融合。我们在现场设置中演示了我们的方法,其中Youtube视频是实时重演的。
{"title":"Face2Face: Real-Time Face Capture and Reenactment of RGB Videos","authors":"Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, M. Nießner","doi":"10.1145/3292039","DOIUrl":"https://doi.org/10.1145/3292039","url":null,"abstract":"We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"2387-2395"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72617275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1552
Learning Aligned Cross-Modal Representations from Weakly Aligned Data 从弱对齐数据中学习对齐的跨模态表示
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.321
Lluís Castrejón, Y. Aytar, Carl Vondrick, H. Pirsiavash, A. Torralba
People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for crossmodal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.
除了自然图像之外,人们可以识别许多不同形式的场景。在本文中,我们研究了如何学习跨模态传输的跨模态场景表征。为了研究这个问题,我们引入了一个新的跨模态场景数据集。虽然卷积神经网络可以很好地对跨模态场景进行分类,但它们也学习了跨模态不对齐的中间表示,这对于跨模态传输应用是不希望的。我们提出了正则化跨模态卷积神经网络的方法,使它们具有与模态无关的共享表示。我们的实验表明,我们的场景表征可以帮助跨模态的表征转移以进行检索。此外,我们的可视化表明,单元出现在共享表征中,倾向于独立于模态的一致概念上激活。
{"title":"Learning Aligned Cross-Modal Representations from Weakly Aligned Data","authors":"Lluís Castrejón, Y. Aytar, Carl Vondrick, H. Pirsiavash, A. Torralba","doi":"10.1109/CVPR.2016.321","DOIUrl":"https://doi.org/10.1109/CVPR.2016.321","url":null,"abstract":"People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks can categorize cross-modal scenes well, they also learn an intermediate representation not aligned across modalities, which is undesirable for crossmodal transfer applications. We present methods to regularize cross-modal convolutional neural networks so that they have a shared representation that is agnostic of the modality. Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval. Moreover, our visualizations suggest that units emerge in the shared representation that tend to activate on consistent concepts independently of the modality.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"2940-2949"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74545112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 158
Monocular Depth Estimation Using Neural Regression Forest 基于神经回归森林的单目深度估计
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.594
Anirban Roy, S. Todorovic
This paper presents a novel deep architecture, called neural regression forest (NRF), for depth estimation from a single image. NRF combines random forests and convolutional neural networks (CNNs). Scanning windows extracted from the image represent samples which are passed down the trees of NRF for predicting their depth. At every tree node, the sample is filtered with a CNN associated with that node. Results of the convolutional filtering are passed to left and right children nodes, i.e., corresponding CNNs, with a Bernoulli probability, until the leaves, where depth estimations are made. CNNs at every node are designed to have fewer parameters than seen in recent work, but their stacked processing along a path in the tree effectively amounts to a deeper CNN. NRF allows for parallelizable training of all "shallow" CNNs, and efficient enforcing of smoothness in depth estimation results. Our evaluation on the benchmark Make3D and NYUv2 datasets demonstrates that NRF outperforms the state of the art, and gracefully handles gradually decreasing training datasets.
本文提出了一种新的深度结构,称为神经回归森林(NRF),用于从单幅图像进行深度估计。NRF结合了随机森林和卷积神经网络(cnn)。从图像中提取的扫描窗口代表样本,这些样本沿着NRF树向下传递以预测其深度。在每个树节点上,使用与该节点相关的CNN对样本进行过滤。卷积滤波的结果以伯努利概率传递给左、右子节点,即相应的cnn,直到叶子节点,在那里进行深度估计。每个节点上的CNN都被设计成比最近工作中看到的参数更少,但它们沿着树中路径的堆叠处理有效地相当于一个更深的CNN。NRF允许对所有“浅”cnn进行并行训练,并有效地加强深度估计结果的平滑性。我们对基准Make3D和NYUv2数据集的评估表明,NRF优于最先进的状态,并且优雅地处理逐渐减少的训练数据集。
{"title":"Monocular Depth Estimation Using Neural Regression Forest","authors":"Anirban Roy, S. Todorovic","doi":"10.1109/CVPR.2016.594","DOIUrl":"https://doi.org/10.1109/CVPR.2016.594","url":null,"abstract":"This paper presents a novel deep architecture, called neural regression forest (NRF), for depth estimation from a single image. NRF combines random forests and convolutional neural networks (CNNs). Scanning windows extracted from the image represent samples which are passed down the trees of NRF for predicting their depth. At every tree node, the sample is filtered with a CNN associated with that node. Results of the convolutional filtering are passed to left and right children nodes, i.e., corresponding CNNs, with a Bernoulli probability, until the leaves, where depth estimations are made. CNNs at every node are designed to have fewer parameters than seen in recent work, but their stacked processing along a path in the tree effectively amounts to a deeper CNN. NRF allows for parallelizable training of all \"shallow\" CNNs, and efficient enforcing of smoothness in depth estimation results. Our evaluation on the benchmark Make3D and NYUv2 datasets demonstrates that NRF outperforms the state of the art, and gracefully handles gradually decreasing training datasets.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"5506-5514"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90184610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 292
Canny Text Detector: Fast and Robust Scene Text Localization Algorithm Canny文本检测器:快速鲁棒的场景文本定位算法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.388
Hojin Cho, Myung-Chul Sung, Bongjin Jun
This paper presents a novel scene text detection algorithm, Canny Text Detector, which takes advantage of the similarity between image edge and text for effective text localization with improved recall rate. As closely related edge pixels construct the structural information of an object, we observe that cohesive characters compose a meaningful word/sentence sharing similar properties such as spatial location, size, color, and stroke width regardless of language. However, prevalent scene text detection approaches have not fully utilized such similarity, but mostly rely on the characters classified with high confidence, leading to low recall rate. By exploiting the similarity, our approach can quickly and robustly localize a variety of texts. Inspired by the original Canny edge detector, our algorithm makes use of double threshold and hysteresis tracking to detect texts of low confidence. Experimental results on public datasets demonstrate that our algorithm outperforms the state-of the-art scene text detection methods in terms of detection rate.
本文提出了一种新的场景文本检测算法——Canny文本检测算法,该算法利用图像边缘和文本之间的相似性进行有效的文本定位,提高了召回率。由于紧密相关的边缘像素构建了对象的结构信息,我们观察到内聚字符组成有意义的单词/句子,无论使用何种语言,它们都具有相似的属性,如空间位置、大小、颜色和笔画宽度。然而,目前流行的场景文本检测方法并没有充分利用这种相似性,而是大多依赖于高置信度分类的字符,导致召回率很低。通过利用相似度,我们的方法可以快速、稳健地定位各种文本。该算法受Canny边缘检测器的启发,利用双阈值和迟滞跟踪来检测低置信度的文本。在公共数据集上的实验结果表明,我们的算法在检测率方面优于目前最先进的场景文本检测方法。
{"title":"Canny Text Detector: Fast and Robust Scene Text Localization Algorithm","authors":"Hojin Cho, Myung-Chul Sung, Bongjin Jun","doi":"10.1109/CVPR.2016.388","DOIUrl":"https://doi.org/10.1109/CVPR.2016.388","url":null,"abstract":"This paper presents a novel scene text detection algorithm, Canny Text Detector, which takes advantage of the similarity between image edge and text for effective text localization with improved recall rate. As closely related edge pixels construct the structural information of an object, we observe that cohesive characters compose a meaningful word/sentence sharing similar properties such as spatial location, size, color, and stroke width regardless of language. However, prevalent scene text detection approaches have not fully utilized such similarity, but mostly rely on the characters classified with high confidence, leading to low recall rate. By exploiting the similarity, our approach can quickly and robustly localize a variety of texts. Inspired by the original Canny edge detector, our algorithm makes use of double threshold and hysteresis tracking to detect texts of low confidence. Experimental results on public datasets demonstrate that our algorithm outperforms the state-of the-art scene text detection methods in terms of detection rate.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"441 1","pages":"3566-3573"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87562459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking 基于图优化-柔性流形排序的目标共分割
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.81
Rong Quan, Junwei Han, Dingwen Zhang, F. Nie
Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years. Although a number of approaches have been proposed to address this problem, many of them are designed with the misleading assumption, unscalable prior, or low flexibility and thus still suffer from certain limitations, which reduces their capability in the real-world scenarios. To alleviate these limitations, we propose a novel two-stage co-segmentation framework, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately. Then a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-segment the common objects. Experiments on three image datasets demonstrate that our method outperforms other state-of-the-art methods.
针对自动发现一组相关图像中包含的共同目标,并将其作为前景进行分割,是近年来研究的热点。尽管已经提出了许多方法来解决这个问题,但其中许多方法的设计都带有误导性的假设,不可扩展的先验或低灵活性,因此仍然受到某些限制,这降低了它们在现实场景中的能力。为了缓解这些局限性,我们提出了一种新的两阶段共分割框架,该框架先引入弱背景,然后建立全局闭环图,分别表示共同目标和联合背景。在此基础上,提出了一种新的图优化柔性流形排序算法,对图的连接和节点标签进行了灵活的优化,实现了共同目标的共分割。在三个图像数据集上的实验表明,我们的方法优于其他最先进的方法。
{"title":"Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking","authors":"Rong Quan, Junwei Han, Dingwen Zhang, F. Nie","doi":"10.1109/CVPR.2016.81","DOIUrl":"https://doi.org/10.1109/CVPR.2016.81","url":null,"abstract":"Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years. Although a number of approaches have been proposed to address this problem, many of them are designed with the misleading assumption, unscalable prior, or low flexibility and thus still suffer from certain limitations, which reduces their capability in the real-world scenarios. To alleviate these limitations, we propose a novel two-stage co-segmentation framework, which introduces the weak background prior to establish a globally close-loop graph to represent the common object and union background separately. Then a novel graph optimized-flexible manifold ranking algorithm is proposed to flexibly optimize the graph connection and node labels to co-segment the common objects. Experiments on three image datasets demonstrate that our method outperforms other state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"59 1","pages":"687-695"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84865108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Structural Correlation Filter for Robust Visual Tracking 鲁棒视觉跟踪的结构相关滤波器
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.467
Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu
In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking. The proposed SCF model takes part-based tracking strategies into account in a correlation filter tracker, and exploits circular shifts of all parts for their motion modeling to preserve target object structure. Compared with existing correlation filter trackers, our proposed tracker has several advantages: (1) Due to the part strategy, the learned structural correlation filters are less sensitive to partial occlusion, and have computational efficiency and robustness. (2) The learned filters are able to not only distinguish the parts from the background as the traditional correlation filters, but also exploit the intrinsic relationship among local parts via spatial constraints to preserve object structure. (3) The learned correlation filters not only make most parts share similar motion, but also tolerate outlier parts that have different motion. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SCF tracking algorithm performs favorably against several state-of-the-art methods.
在本文中,我们提出了一种新的结构相关滤波器(SCF)模型用于鲁棒视觉跟踪。该模型在相关滤波跟踪器中考虑了基于部件的跟踪策略,并利用各部件的循环移位进行运动建模,以保持目标物体的结构。与现有的相关滤波器跟踪器相比,本文提出的跟踪器具有以下优点:(1)由于采用局部策略,学习到的结构相关滤波器对局部遮挡的敏感性较低,具有计算效率和鲁棒性。(2)学习后的滤波器既能像传统的相关滤波器那样将局部与背景区分开来,又能通过空间约束利用局部之间的内在联系来保持目标结构。(3)学习到的相关滤波器不仅使大多数部分具有相似的运动,而且能够容忍具有不同运动的异常部分。对具有挑战性的基准图像序列的定性和定量评估表明,所提出的SCF跟踪算法与几种最先进的方法相比表现良好。
{"title":"Structural Correlation Filter for Robust Visual Tracking","authors":"Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu","doi":"10.1109/CVPR.2016.467","DOIUrl":"https://doi.org/10.1109/CVPR.2016.467","url":null,"abstract":"In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking. The proposed SCF model takes part-based tracking strategies into account in a correlation filter tracker, and exploits circular shifts of all parts for their motion modeling to preserve target object structure. Compared with existing correlation filter trackers, our proposed tracker has several advantages: (1) Due to the part strategy, the learned structural correlation filters are less sensitive to partial occlusion, and have computational efficiency and robustness. (2) The learned filters are able to not only distinguish the parts from the background as the traditional correlation filters, but also exploit the intrinsic relationship among local parts via spatial constraints to preserve object structure. (3) The learned correlation filters not only make most parts share similar motion, but also tolerate outlier parts that have different motion. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SCF tracking algorithm performs favorably against several state-of-the-art methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"535 ","pages":"4312-4320"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91450065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 165
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1