首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
Scale-Free Content Based Image Retrieval (or Nearly so) 基于无比例内容的图像检索(或接近)
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.42
Adrian Daniel Popescu, Alexandra-Lucian Gînsca, H. Borgne
When textual annotations of Web and social media images are poor or missing, content-based image retrieval is an interesting way to access them. Finding an optimal trade-off between accuracy and scalability for CBIR is challenging in practice. We propose a retrieval method whose complexity is nearly independent of the collection scale and does not degrade results quality. Images are represented with sparse semantic features that can be stored as an inverted index. Search complexity is drastically reduced by (1) considering the query feature dimensions independently and thus turning search into a concatenation operation and (2) pruning the index in function of a retrieval objective. To improve precision, the inverted index look-up is complemented with an exhaustive search over a fixed size list of intermediary results. We run experiments with three public collections and results show that our much faster method slightly outperforms an exhaustive search done with two competitive baselines.
当Web和社交媒体图像的文本注释很差或缺失时,基于内容的图像检索是访问它们的一种有趣的方式。在实际应用中,在CBIR的准确性和可扩展性之间找到一个最佳的平衡点是很有挑战性的。我们提出了一种检索方法,其复杂性几乎与收集规模无关,并且不会降低结果质量。图像用稀疏的语义特征表示,这些特征可以存储为倒排索引。通过(1)独立考虑查询特征维度,从而将搜索转换为串联操作(2)根据检索目标对索引进行修剪,极大地降低了搜索复杂度。为了提高精度,倒排索引查找需要对固定大小的中间结果列表进行穷举搜索。我们对三个公共集合进行了实验,结果表明,我们更快的方法略微优于使用两个竞争基线进行的穷举搜索。
{"title":"Scale-Free Content Based Image Retrieval (or Nearly so)","authors":"Adrian Daniel Popescu, Alexandra-Lucian Gînsca, H. Borgne","doi":"10.1109/ICCVW.2017.42","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.42","url":null,"abstract":"When textual annotations of Web and social media images are poor or missing, content-based image retrieval is an interesting way to access them. Finding an optimal trade-off between accuracy and scalability for CBIR is challenging in practice. We propose a retrieval method whose complexity is nearly independent of the collection scale and does not degrade results quality. Images are represented with sparse semantic features that can be stored as an inverted index. Search complexity is drastically reduced by (1) considering the query feature dimensions independently and thus turning search into a concatenation operation and (2) pruning the index in function of a retrieval objective. To improve precision, the inverted index look-up is complemented with an exhaustive search over a fixed size list of intermediary results. We run experiments with three public collections and results show that our much faster method slightly outperforms an exhaustive search done with two competitive baselines.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114843465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition 长短期记忆卷积神经网络在第一人称视觉活动识别中的应用
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.159
Girmaw Abebe, A. Cavallaro
Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.
时间信息是第一人称视觉(FPV)本体感觉活动识别的主要鉴别特征来源。在本文中,我们提出了一种使用堆叠谱图的运动表示。这些谱图是在时间窗口上由平均网格光流矢量和强度质心的位移矢量生成的。堆叠表示使我们能够使用二维卷积来学习和提取全局运动特征。此外,我们采用长短期记忆(LSTM)网络对连续样本间的时间依赖性进行递归编码。实验结果表明,该方法在最大的公共数据集中达到了最先进的FPV活动识别性能。
{"title":"A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition","authors":"Girmaw Abebe, A. Cavallaro","doi":"10.1109/ICCVW.2017.159","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.159","url":null,"abstract":"Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125596090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Spatial-Temporal Weighted Pyramid Using Spatial Orthogonal Pooling 基于空间正交池的时空加权金字塔
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.127
Yusuke Mukuta, Y. Ushiku, T. Harada
Feature pooling is a method that summarizes local descriptors in an image using spatial information. Spatial pyramid matching uses the statistics of local features in an image subregion as a global feature. However, the disadvantages of this method are that there is no theoretical guideline for selecting the pooling region, robustness to small image translation is lost around the edges of the pooling region, the information encoded in the different feature pyramids overlaps, and thus recognition performance stagnates as a greater pyramid size is selected. In this research, we propose a novel interpretation that regards feature pooling as an orthogonal projection in the space of functions that maps the image space to the local feature space. Moreover, we propose a novel feature-pooling method that orthogonally projects the function form of local descriptors into the space of low-degree polynomials. We also evaluate the robustness of the proposed method. Experimental results demonstrate the effectiveness of the proposed methods.
特征池化是一种利用空间信息对图像中的局部描述符进行汇总的方法。空间金字塔匹配利用图像子区域的局部特征统计作为全局特征。然而,该方法的缺点是没有选择池化区域的理论指导,在池化区域的边缘失去对小图像平移的鲁棒性,不同特征金字塔中编码的信息重叠,当选择更大的金字塔大小时,识别性能会停滞不前。在这项研究中,我们提出了一种新的解释,将特征池化视为将图像空间映射到局部特征空间的函数空间中的正交投影。此外,我们提出了一种新的特征池化方法,该方法将局部描述符的函数形式正交投影到低次多项式空间中。我们还评估了所提出方法的鲁棒性。实验结果证明了所提方法的有效性。
{"title":"Spatial-Temporal Weighted Pyramid Using Spatial Orthogonal Pooling","authors":"Yusuke Mukuta, Y. Ushiku, T. Harada","doi":"10.1109/ICCVW.2017.127","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.127","url":null,"abstract":"Feature pooling is a method that summarizes local descriptors in an image using spatial information. Spatial pyramid matching uses the statistics of local features in an image subregion as a global feature. However, the disadvantages of this method are that there is no theoretical guideline for selecting the pooling region, robustness to small image translation is lost around the edges of the pooling region, the information encoded in the different feature pyramids overlaps, and thus recognition performance stagnates as a greater pyramid size is selected. In this research, we propose a novel interpretation that regards feature pooling as an orthogonal projection in the space of functions that maps the image space to the local feature space. Moreover, we propose a novel feature-pooling method that orthogonally projects the function form of local descriptors into the space of low-degree polynomials. We also evaluate the robustness of the proposed method. Experimental results demonstrate the effectiveness of the proposed methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127975705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-Based Efficient Moving Object Detection for Low-Altitude Aerial Platforms 基于特征的低空航空平台高效运动目标检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.248
K. B. Logoglu, Hazal Lezki, M. K. Yucel, A. Ozturk, Alper Kucukkomurler, Batuhan Karagöz, Aykut Erdem, Erkut Erdem
Moving Object Detection is one of the integral tasks for aerial reconnaissance and surveillance applications. Despite the problem's rising potential due to increasing availability of unmanned aerial vehicles, moving object detection suffers from a lack of widely-accepted, correctly labelled dataset that would facilitate a robust evaluation of the techniques published by the community. Towards this end, we compile a new dataset by manually annotating several sequences from VIVID and UAV123 datasets for moving object detection. We also propose a feature-based, efficient pipeline that is optimized for near real-time performance on GPU-based embedded SoMs (system on module). We evaluate our pipeline on this extended dataset for low altitude moving object detection. Ground-truth annotations are made publicly available to the community to foster further research in moving object detection field.
运动目标检测是空中侦察和监视应用中不可或缺的任务之一。尽管由于无人驾驶飞行器的可用性越来越高,这个问题的潜力越来越大,但移动物体检测仍然缺乏被广泛接受的、正确标记的数据集,这将有助于对社区发布的技术进行稳健的评估。为此,我们通过手动注释来自VIVID和UAV123数据集的几个序列来编译一个新的数据集,用于移动目标检测。我们还提出了一种基于特征的高效管道,该管道针对基于gpu的嵌入式som(系统对模块)的近实时性能进行了优化。我们在这个扩展数据集上评估了低空移动目标检测的管道。Ground-truth注释公开提供给社区,以促进在移动目标检测领域的进一步研究。
{"title":"Feature-Based Efficient Moving Object Detection for Low-Altitude Aerial Platforms","authors":"K. B. Logoglu, Hazal Lezki, M. K. Yucel, A. Ozturk, Alper Kucukkomurler, Batuhan Karagöz, Aykut Erdem, Erkut Erdem","doi":"10.1109/ICCVW.2017.248","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.248","url":null,"abstract":"Moving Object Detection is one of the integral tasks for aerial reconnaissance and surveillance applications. Despite the problem's rising potential due to increasing availability of unmanned aerial vehicles, moving object detection suffers from a lack of widely-accepted, correctly labelled dataset that would facilitate a robust evaluation of the techniques published by the community. Towards this end, we compile a new dataset by manually annotating several sequences from VIVID and UAV123 datasets for moving object detection. We also propose a feature-based, efficient pipeline that is optimized for near real-time performance on GPU-based embedded SoMs (system on module). We evaluate our pipeline on this extended dataset for low altitude moving object detection. Ground-truth annotations are made publicly available to the community to foster further research in moving object detection field.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134158467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks 基于卷积神经网络的大规模多模态手势分割与识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.371
Huogen Wang, Pichao Wang, Zhanjie Song, W. Li
This paper presents an effective method for continuous gesture recognition. The method consists of two modules: segmentation and recognition. In the segmentation module, a continuous gesture sequence is segmented into isolated gesture sequences by classifying the frames into gesture frames and transitional frames using two stream convolutional neural networks. In the recognition module, our method exploits the spatiotemporal information embedded in RGB and depth sequences. For the depth modality, our method converts a sequence into Dynamic Images and Motion Dynamic Images through rank pooling and input them to Convolutional Neural Networks respectively. For the RGB modality, our method adopts Convolutional LSTM Networks to learn long-term spatiotemporal features from short-term spatiotemporal features obtained by a 3D convolutional neural network. Our method has been evaluated on ChaLearn LAP Large-scale Continuous Gesture Dataset and achieved the state-of-the-art performance.
本文提出了一种有效的连续手势识别方法。该方法包括两个模块:分割和识别。在分割模块中,使用两个流卷积神经网络将连续的手势序列划分为手势帧和过渡帧,将连续的手势序列分割为孤立的手势序列。在识别模块中,我们的方法利用了嵌入在RGB和深度序列中的时空信息。对于深度模式,我们的方法通过秩池将序列转换为动态图像和运动动态图像,并分别输入到卷积神经网络中。对于RGB模态,我们的方法采用卷积LSTM网络从三维卷积神经网络获得的短期时空特征中学习长期时空特征。我们的方法已经在ChaLearn LAP大规模连续手势数据集上进行了评估,达到了最先进的性能。
{"title":"Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks","authors":"Huogen Wang, Pichao Wang, Zhanjie Song, W. Li","doi":"10.1109/ICCVW.2017.371","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.371","url":null,"abstract":"This paper presents an effective method for continuous gesture recognition. The method consists of two modules: segmentation and recognition. In the segmentation module, a continuous gesture sequence is segmented into isolated gesture sequences by classifying the frames into gesture frames and transitional frames using two stream convolutional neural networks. In the recognition module, our method exploits the spatiotemporal information embedded in RGB and depth sequences. For the depth modality, our method converts a sequence into Dynamic Images and Motion Dynamic Images through rank pooling and input them to Convolutional Neural Networks respectively. For the RGB modality, our method adopts Convolutional LSTM Networks to learn long-term spatiotemporal features from short-term spatiotemporal features obtained by a 3D convolutional neural network. Our method has been evaluated on ChaLearn LAP Large-scale Continuous Gesture Dataset and achieved the state-of-the-art performance.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134387283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation 结合边界和中心相关滤波器的宽高比变化视觉跟踪
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.234
Feng Li, Yingjie Yao, P. Li, D. Zhang, W. Zuo, Ming-Hsuan Yang
The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance. Although many correlation filter (CF)-based trackers have also been suggested for scale adaptive tracking, few studies have been given to handle the aspect ratio variation for CF trackers. In this paper, we make the first attempt to address this issue by introducing a family of 1D boundary CFs to localize the left, right, top, and bottom boundaries in videos. This allows us cope with the aspect ratio variation flexibly during tracking. Specifically, we present a novel tracking model to integrate 1D Boundary and 2D Center CFs (IBCCF) where boundary and center filters are enforced by a near-orthogonality regularization term. To optimize our IBCCF model, we develop an alternating direction method of multipliers. Experiments on several datasets show that IBCCF can effectively handle aspect ratio variation, and achieves state-of-the-art performance in terms of accuracy and robustness.
宽高比变化是视觉跟踪中经常出现的问题,严重影响视觉跟踪的性能。尽管许多基于相关滤波器(CF)的跟踪器也被提出用于尺度自适应跟踪,但很少有研究处理CF跟踪器的宽高比变化。在本文中,我们首次尝试通过引入一系列一维边界cf来定位视频中的左、右、上、下边界来解决这个问题。这使我们能够在跟踪过程中灵活地处理纵横比变化。具体来说,我们提出了一种新的集成一维边界和二维中心cf (IBCCF)的跟踪模型,其中边界和中心滤波器通过近正交正则化项强制执行。为了优化我们的IBCCF模型,我们开发了一种乘子的交替方向方法。在多个数据集上的实验表明,IBCCF可以有效地处理纵横比变化,并在精度和鲁棒性方面达到了最先进的性能。
{"title":"Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation","authors":"Feng Li, Yingjie Yao, P. Li, D. Zhang, W. Zuo, Ming-Hsuan Yang","doi":"10.1109/ICCVW.2017.234","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.234","url":null,"abstract":"The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance. Although many correlation filter (CF)-based trackers have also been suggested for scale adaptive tracking, few studies have been given to handle the aspect ratio variation for CF trackers. In this paper, we make the first attempt to address this issue by introducing a family of 1D boundary CFs to localize the left, right, top, and bottom boundaries in videos. This allows us cope with the aspect ratio variation flexibly during tracking. Specifically, we present a novel tracking model to integrate 1D Boundary and 2D Center CFs (IBCCF) where boundary and center filters are enforced by a near-orthogonality regularization term. To optimize our IBCCF model, we develop an alternating direction method of multipliers. Experiments on several datasets show that IBCCF can effectively handle aspect ratio variation, and achieves state-of-the-art performance in terms of accuracy and robustness.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131758879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Coarse-to-Fine Deep Kernel Networks 粗到精深度核网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.137
H. Sahbi
In this paper, we address the issue of efficient computation in deep kernel networks. We propose a novel framework that reduces dramatically the complexity of evaluating these deep kernels. Our method is based on a coarse-to-fine cascade of networks designed for efficient computation; early stages of the cascade are cheap and reject many patterns efficiently while deep stages are more expensive and accurate. The design principle of these reduced complexity networks is based on a variant of the cross-entropy criterion that reduces the complexity of the networks in the cascade while preserving all the positive responses of the original kernel network. Experiments conducted - on the challenging and time demanding change detection task, on very large satellite images - show that our proposed coarse-to-fine approach is effective and highly efficient.
在本文中,我们讨论了深度核网络的高效计算问题。我们提出了一个新的框架,大大降低了评估这些深度核的复杂性。我们的方法是基于一个为高效计算而设计的从粗到精的级联网络;级联的早期阶段很便宜,可以有效地排除许多模式,而深层阶段更昂贵,也更准确。这些降低复杂性网络的设计原则是基于交叉熵准则的一种变体,该准则降低了级联中网络的复杂性,同时保留了原始核心网络的所有正响应。在具有挑战性和时间要求的变化检测任务中,在非常大的卫星图像上进行的实验表明,我们提出的从粗到精的方法是有效和高效的。
{"title":"Coarse-to-Fine Deep Kernel Networks","authors":"H. Sahbi","doi":"10.1109/ICCVW.2017.137","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.137","url":null,"abstract":"In this paper, we address the issue of efficient computation in deep kernel networks. We propose a novel framework that reduces dramatically the complexity of evaluating these deep kernels. Our method is based on a coarse-to-fine cascade of networks designed for efficient computation; early stages of the cascade are cheap and reject many patterns efficiently while deep stages are more expensive and accurate. The design principle of these reduced complexity networks is based on a variant of the cross-entropy criterion that reduces the complexity of the networks in the cascade while preserving all the positive responses of the original kernel network. Experiments conducted - on the challenging and time demanding change detection task, on very large satellite images - show that our proposed coarse-to-fine approach is effective and highly efficient.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127568402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Fusing Image and Segmentation Cues for Skeleton Extraction in the Wild 融合图像和分割线索用于野外骨骼提取
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.205
Xiaolong Liu, Pengyuan Lyu, X. Bai, Ming-Ming Cheng
Extracting skeletons from natural images is a challenging problem, due to complex backgrounds in the scene and various scales of objects. To address this problem, we propose a two-stream fully convolutional neural network which uses the original image and its corresponding semantic segmentation probability map as inputs and predicts the skeleton map using merged multi-scale features. We find that the semantic segmentation probability map is complementary to the corresponding color image and can boost the performance of our baseline model which trained only on color images. We conduct experiments on SK-LARGE dataset and the F-measure of our method on validation set is 0.738 which outperforms current state-of-the-art significantly and demonstrates the effectiveness of our proposed approach.
由于场景背景复杂,物体尺度各异,从自然图像中提取骨架是一个具有挑战性的问题。为了解决这一问题,我们提出了一种两流全卷积神经网络,该网络以原始图像及其相应的语义分割概率图为输入,并使用合并的多尺度特征预测骨架图。我们发现,语义分割概率图与相应的彩色图像是互补的,可以提高仅在彩色图像上训练的基线模型的性能。我们在SK-LARGE数据集上进行了实验,我们的方法在验证集上的f值为0.738,显著优于当前的最先进技术,证明了我们提出的方法的有效性。
{"title":"Fusing Image and Segmentation Cues for Skeleton Extraction in the Wild","authors":"Xiaolong Liu, Pengyuan Lyu, X. Bai, Ming-Ming Cheng","doi":"10.1109/ICCVW.2017.205","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.205","url":null,"abstract":"Extracting skeletons from natural images is a challenging problem, due to complex backgrounds in the scene and various scales of objects. To address this problem, we propose a two-stream fully convolutional neural network which uses the original image and its corresponding semantic segmentation probability map as inputs and predicts the skeleton map using merged multi-scale features. We find that the semantic segmentation probability map is complementary to the corresponding color image and can boost the performance of our baseline model which trained only on color images. We conduct experiments on SK-LARGE dataset and the F-measure of our method on validation set is 0.738 which outperforms current state-of-the-art significantly and demonstrates the effectiveness of our proposed approach.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131051019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Dress Like a Star: Retrieving Fashion Products from Videos 穿得像明星:从视频中检索时尚产品
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.270
Noa García, George Vogiatzis
This work proposes a system for retrieving clothing and fashion products from video content. Although films and television are the perfect showcase for fashion brands to promote their products, spectators are not always aware of where to buy the latest trends they see on screen. Here, a framework for breaking the gap between fashion products shown on videos and users is presented. By relating clothing items and video frames in an indexed database and performing frame retrieval with temporal aggregation and fast indexing techniques, we can find fashion products from videos in a simple and non-intrusive way. Experiments in a large-scale dataset conducted here show that, by using the proposed framework, memory requirements can be reduced by 42.5X with respect to linear search, whereas accuracy is maintained at around 90%.
本工作提出了一个从视频内容中检索服装和时尚产品的系统。虽然电影和电视是时尚品牌宣传其产品的完美展示平台,但观众并不总是知道在哪里可以买到他们在屏幕上看到的最新流行趋势。在此,提出了一个打破视频中展示的时尚产品与用户之间差距的框架。通过将索引数据库中的服装项目和视频帧关联起来,利用时间聚合和快速索引技术进行帧检索,可以简单、无干扰地从视频中找到时尚产品。在这里进行的大规模数据集实验表明,通过使用所提出的框架,相对于线性搜索,内存需求可以减少42.5倍,而准确率保持在90%左右。
{"title":"Dress Like a Star: Retrieving Fashion Products from Videos","authors":"Noa García, George Vogiatzis","doi":"10.1109/ICCVW.2017.270","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.270","url":null,"abstract":"This work proposes a system for retrieving clothing and fashion products from video content. Although films and television are the perfect showcase for fashion brands to promote their products, spectators are not always aware of where to buy the latest trends they see on screen. Here, a framework for breaking the gap between fashion products shown on videos and users is presented. By relating clothing items and video frames in an indexed database and performing frame retrieval with temporal aggregation and fast indexing techniques, we can find fashion products from videos in a simple and non-intrusive way. Experiments in a large-scale dataset conducted here show that, by using the proposed framework, memory requirements can be reduced by 42.5X with respect to linear search, whereas accuracy is maintained at around 90%.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130727682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Compact Color Texture Descriptor Based on Rank Transform and Product Ordering in the RGB Color Space RGB颜色空间中基于秩变换和积排序的紧凑颜色纹理描述符
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.126
Antonio Fernández, David Lima, F. Bianconi, F. Smeraldi
Color information is generally considered useful for texture analysis. However, an important category of highly effective texture descriptors - namely rank features - has no obvious extension to color spaces, on which no canonical order is defined. In this work, we explore the use of partial orders in conjunction with rank features. We introduce the rank transform based on product ordering, that generalizes the classic rank transform to RGB space by a combined tally of dominated and non-comparable pixels. Experimental results on nine heterogeneous standard databases confirm that our approach outperforms the standard rank transform and its extension to lexicographic and bit mixing total orders, as well as to the preorders based on the Euclidean distance to a reference color. The low computational complexity and compact codebook size of the transform make it suitable for multi-scale approaches.
颜色信息通常被认为对纹理分析很有用。然而,一种重要的高效纹理描述符——即秩特征——对颜色空间没有明显的扩展,在颜色空间上没有定义规范的顺序。在这项工作中,我们探讨了偏序与秩特征的结合使用。引入了基于积排序的秩变换,通过对主导像素和不可比较像素的组合计数,将经典的秩变换推广到RGB空间。在9个异构标准数据库上的实验结果证实,我们的方法优于标准秩变换及其在字典和位混合总顺序上的扩展,以及基于到参考颜色的欧几里得距离的预定顺序。该变换的低计算复杂度和紧凑的码本尺寸使其适合于多尺度方法。
{"title":"Compact Color Texture Descriptor Based on Rank Transform and Product Ordering in the RGB Color Space","authors":"Antonio Fernández, David Lima, F. Bianconi, F. Smeraldi","doi":"10.1109/ICCVW.2017.126","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.126","url":null,"abstract":"Color information is generally considered useful for texture analysis. However, an important category of highly effective texture descriptors - namely rank features - has no obvious extension to color spaces, on which no canonical order is defined. In this work, we explore the use of partial orders in conjunction with rank features. We introduce the rank transform based on product ordering, that generalizes the classic rank transform to RGB space by a combined tally of dominated and non-comparable pixels. Experimental results on nine heterogeneous standard databases confirm that our approach outperforms the standard rank transform and its extension to lexicographic and bit mixing total orders, as well as to the preorders based on the Euclidean distance to a reference color. The low computational complexity and compact codebook size of the transform make it suitable for multi-scale approaches.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133070882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1