首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
Improving a Real-Time Object Detector with Compact Temporal Information 基于紧凑时间信息的实时目标检测器的改进
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.31
Martin Ahrnbom, M. B. Jensen, Kalle Åström, M. Nilsson, H. Ardö, T. Moeslund
Neural networks designed for real-time object detection have recently improved significantly, but in practice, looking at only a single RGB image at the time may not be ideal. For example, when detecting objects in videos, a foreground detection algorithm can be used to obtain compact temporal data, which can be fed into a neural network alongside RGB images. We propose an approach for doing this, based on an existing object detector, that re-uses pretrained weights for the processing of RGB images. The neural network was tested on the VIRAT dataset with annotations for object detection, a problem this approach is well suited for. The accuracy was found to improve significantly (up to 66%), with a roughly 40% increase in computational time.
为实时目标检测而设计的神经网络最近有了显著的改进,但在实践中,当时只查看单个RGB图像可能并不理想。例如,在检测视频中的物体时,可以使用前景检测算法来获得紧凑的时间数据,这些数据可以与RGB图像一起输入神经网络。我们提出了一种基于现有目标检测器的方法,该方法重用预训练的权重来处理RGB图像。神经网络在带有对象检测注释的VIRAT数据集上进行了测试,这是该方法非常适合的问题。结果发现,准确率显著提高(高达66%),计算时间增加了大约40%。
{"title":"Improving a Real-Time Object Detector with Compact Temporal Information","authors":"Martin Ahrnbom, M. B. Jensen, Kalle Åström, M. Nilsson, H. Ardö, T. Moeslund","doi":"10.1109/ICCVW.2017.31","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.31","url":null,"abstract":"Neural networks designed for real-time object detection have recently improved significantly, but in practice, looking at only a single RGB image at the time may not be ideal. For example, when detecting objects in videos, a foreground detection algorithm can be used to obtain compact temporal data, which can be fed into a neural network alongside RGB images. We propose an approach for doing this, based on an existing object detector, that re-uses pretrained weights for the processing of RGB images. The neural network was tested on the VIRAT dataset with annotations for object detection, a problem this approach is well suited for. The accuracy was found to improve significantly (up to 66%), with a roughly 40% increase in computational time.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130626561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Feature Learning with Rank-Based Candidate Selection for Product Search 基于等级的候选产品搜索特征学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.44
Y. Kuo, Winston H. Hsu
Nowadays, more and more people buy products via e-commerce websites. We can not only compare prices from different online retailers but also obtain useful review comments from other customers. Especially, people tend to search for visually similar products when they are looking for possible candidates. The need for product search is emerging. To tackle the problem, recent works integrate different additional information (e.g., attributes, image pairs, category) with deep convolutional neural networks (CNNs) for solving cross-domain image retrieval and product search. Based on the state-of-the-art approaches, we propose a rank-based candidate selection for feature learning. Given a query image, we attempt to push hard negative (irrelevant) images away from queries and make ambiguous positive (relevant) images close to queries. We investigate the effects of global and attention-based local features on the proposed method, and achieve 15.8% relative gain for product search.
如今,越来越多的人通过电子商务网站购买产品。我们不仅可以比较不同网上零售商的价格,还可以从其他顾客那里获得有用的评论意见。特别是,当人们在寻找可能的候选产品时,他们倾向于搜索视觉上相似的产品。对产品搜索的需求正在出现。为了解决这个问题,最近的研究将不同的附加信息(如属性、图像对、类别)与深度卷积神经网络(cnn)集成在一起,以解决跨域图像检索和产品搜索问题。基于最先进的方法,我们提出了一种基于秩的特征学习候选选择方法。给定一个查询图像,我们试图将硬的负面(不相关)图像从查询中推出去,并使模糊的正面(相关)图像靠近查询。我们研究了全局特征和基于注意力的局部特征对所提出方法的影响,并获得了15.8%的产品搜索相对增益。
{"title":"Feature Learning with Rank-Based Candidate Selection for Product Search","authors":"Y. Kuo, Winston H. Hsu","doi":"10.1109/ICCVW.2017.44","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.44","url":null,"abstract":"Nowadays, more and more people buy products via e-commerce websites. We can not only compare prices from different online retailers but also obtain useful review comments from other customers. Especially, people tend to search for visually similar products when they are looking for possible candidates. The need for product search is emerging. To tackle the problem, recent works integrate different additional information (e.g., attributes, image pairs, category) with deep convolutional neural networks (CNNs) for solving cross-domain image retrieval and product search. Based on the state-of-the-art approaches, we propose a rank-based candidate selection for feature learning. Given a query image, we attempt to push hard negative (irrelevant) images away from queries and make ambiguous positive (relevant) images close to queries. We investigate the effects of global and attention-based local features on the proposed method, and achieve 15.8% relative gain for product search.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125551193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
moM: Mean of Moments Feature for Person Re-identification 基于矩均值特征的人物再识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.154
Mengran Gou, O. Camps, M. Sznaier
Person re-identification (re-id) has drawn significant attention in the recent decade. The design of view-invariant feature descriptors is one of the most crucial problems for this task. Covariance descriptors have often been used in person re-id because of their invariance properties. More recently, a new state-of-the-art performance was achieved by also including first-order moment and two-level Gaussian descriptors. However, using second-order or lower moments information might not be enough when the feature distribution is not Gaussian. In this paper, we address this limitation, by using the empirical (symmetric positive definite) moment matrix to incorporate higher order moments and by applying the on-manifold mean to pool the features along horizontal strips. The new descriptor, based on the on-manifold mean of a moment matrix (moM), can be used to approximate more complex, non-Gaussian, distributions of the pixel features within a mid-sized local patch. We have evaluated the proposed feature on five widely used re-id datasets. The experiments show that the moM and hierarchical Gaussian descriptor (GOG) [30] features complement each other and that using a combination of both features achieves a comparable performance with the state-of-the-art methods.
近十年来,个人身份再识别(re-id)引起了人们的极大关注。视图不变特征描述符的设计是该任务的关键问题之一。协方差描述符由于其不变性而经常被用于个人身份识别。最近,通过还包括一阶矩和两级高斯描述符,实现了新的最先进的性能。然而,当特征分布不是高斯分布时,使用二阶或低阶矩信息可能是不够的。在本文中,我们通过使用经验(对称正定)矩矩阵来合并高阶矩,并通过应用流形平均值来汇集沿水平条的特征来解决这一限制。新的描述符,基于矩矩阵的流形均值(moM),可以用来近似更复杂的,非高斯分布的像素特征在一个中等大小的局部补丁。我们在五个广泛使用的重标识数据集上评估了提议的特征。实验表明,moM和分层高斯描述子(hierarchical Gaussian descriptor, GOG)[30]特征是互补的,使用这两个特征的组合可以达到与最先进方法相当的性能。
{"title":"moM: Mean of Moments Feature for Person Re-identification","authors":"Mengran Gou, O. Camps, M. Sznaier","doi":"10.1109/ICCVW.2017.154","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.154","url":null,"abstract":"Person re-identification (re-id) has drawn significant attention in the recent decade. The design of view-invariant feature descriptors is one of the most crucial problems for this task. Covariance descriptors have often been used in person re-id because of their invariance properties. More recently, a new state-of-the-art performance was achieved by also including first-order moment and two-level Gaussian descriptors. However, using second-order or lower moments information might not be enough when the feature distribution is not Gaussian. In this paper, we address this limitation, by using the empirical (symmetric positive definite) moment matrix to incorporate higher order moments and by applying the on-manifold mean to pool the features along horizontal strips. The new descriptor, based on the on-manifold mean of a moment matrix (moM), can be used to approximate more complex, non-Gaussian, distributions of the pixel features within a mid-sized local patch. We have evaluated the proposed feature on five widely used re-id datasets. The experiments show that the moM and hierarchical Gaussian descriptor (GOG) [30] features complement each other and that using a combination of both features achieves a comparable performance with the state-of-the-art methods.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"423 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126714336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Continuous Gesture Recognition with Hand-Oriented Spatiotemporal Feature 基于手导向时空特征的连续手势识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.361
Zhipeng Liu, Xiujuan Chai, Zhuang Liu, Xilin Chen
In this paper, an efficient spotting-recognition framework is proposed to tackle the large scale continuous gesture recognition problem with the RGB-D data input. Concretely, continuous gestures are firstly segmented into isolated gestures based on the accurate hand positions obtained by two streams Faster R-CNN hand detector In the subsequent recognition stage, firstly, towards the gesture representation, a specific hand-oriented spatiotemporal (ST) feature is extracted for each isolated gesture video by 3D convolutional network (C3D). In this feature, only the hand regions and face location are considered, which can effectively block the negative influence of the distractors, such as the background, cloth and the body and so on. Next, the extracted features from calibrated RGB and depth channels are fused to boost the representative power and the final classification is achieved by using the simple linear SVM. Extensive experiments are conducted on the validation and testing sets of the Continuous Gesture Datasets (ConGD) to validate the effectiveness of the proposed recognition framework. Our method achieves the promising performance with the mean Jaccard Index of 0.6103 and outperforms other results in the ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge.
针对RGB-D数据输入的大规模连续手势识别问题,提出了一种高效的点识别框架。在后续的识别阶段,首先针对手势表示,利用三维卷积网络(3D convolutional network, C3D)对每个孤立的手势视频提取一个特定的面向手部的时空(ST)特征。在该特征中,只考虑手部区域和面部位置,可以有效地阻挡背景、衣服、身体等干扰物的负面影响。然后,将校正后的RGB通道和深度通道提取的特征进行融合以增强代表性,最后使用简单线性支持向量机实现最终分类。在连续手势数据集(cond)的验证和测试集上进行了大量实验,以验证所提出的识别框架的有效性。我们的方法在ChaLearn LAP大规模连续手势识别挑战赛中取得了令人满意的成绩,平均Jaccard指数为0.6103,优于其他结果。
{"title":"Continuous Gesture Recognition with Hand-Oriented Spatiotemporal Feature","authors":"Zhipeng Liu, Xiujuan Chai, Zhuang Liu, Xilin Chen","doi":"10.1109/ICCVW.2017.361","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.361","url":null,"abstract":"In this paper, an efficient spotting-recognition framework is proposed to tackle the large scale continuous gesture recognition problem with the RGB-D data input. Concretely, continuous gestures are firstly segmented into isolated gestures based on the accurate hand positions obtained by two streams Faster R-CNN hand detector In the subsequent recognition stage, firstly, towards the gesture representation, a specific hand-oriented spatiotemporal (ST) feature is extracted for each isolated gesture video by 3D convolutional network (C3D). In this feature, only the hand regions and face location are considered, which can effectively block the negative influence of the distractors, such as the background, cloth and the body and so on. Next, the extracted features from calibrated RGB and depth channels are fused to boost the representative power and the final classification is achieved by using the simple linear SVM. Extensive experiments are conducted on the validation and testing sets of the Continuous Gesture Datasets (ConGD) to validate the effectiveness of the proposed recognition framework. Our method achieves the promising performance with the mean Jaccard Index of 0.6103 and outperforms other results in the ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126838765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
LBP-Flow and Hybrid Encoding for Real-Time Water and Fire Classification 实时水火分类的LBP-Flow和混合编码
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.56
Konstantinos Avgerinakis, Panagiotis Giannakeris, A. Briassouli, A. Karakostas, S. Vrochidis, Y. Kompatsiaris
The analysis of dynamic scenes in video is a very useful task especially for the detection and monitoring of natural hazards such as floods and fires. In this work, we focus on the challenging problem of real-world dynamic scene understanding, where videos contain dynamic textures that have been recorded in the "wild". These videos feature large illumination variations, complex motion, occlusions, camera motion, as well as significant intra-class differences, as the motion patterns of dynamic textures of the same category may be subject to large variations in real world recordings. We address these issues by introducing a novel dynamic texture descriptor, the "Local Binary Pattern-flow" (LBP-flow), which is shown to be able to accurately classify dynamic scenes whose complex motion patterns are difficult to separate using existing local descriptors, or which cannot be modelled by statistical techniques. LBP-flow builds upon existing Local Binary Pattern (LBP) descriptors by providing a low-cost representation of both appearance and optical flow textures, to increase its representation capabilities. The descriptor statistics are encoded with the Fisher vector, an informative mid-level descriptor, while a neural network follows to reduce the dimensionality and increase the discriminability of the encoded descriptor. The proposed algorithm leads to a highly accurate spatio-temporal descriptor which achieves a very low computational cost, enabling the deployment of our descriptor in real world surveillance and security applications. Experiments on challenging benchmark datasets demonstrate that it achieves recognition accuracy results that surpass State-of-the-Art dynamic texture descriptors.
视频中动态场景的分析是一项非常有用的任务,特别是对于洪水和火灾等自然灾害的检测和监控。在这项工作中,我们专注于现实世界动态场景理解的挑战性问题,其中视频包含在“野外”录制的动态纹理。这些视频具有很大的光照变化、复杂的运动、遮挡、摄像机运动以及显著的类内差异,因为同一类别的动态纹理的运动模式可能在真实世界的记录中有很大的变化。我们通过引入一种新的动态纹理描述符“局部二进制模式流”(LBP-flow)来解决这些问题,该描述符被证明能够准确地分类动态场景,这些场景的复杂运动模式难以使用现有的局部描述符分离,或者无法通过统计技术建模。LBP-flow建立在现有的局部二进制模式(LBP)描述符的基础上,通过提供低成本的外观和光流纹理表示来提高其表示能力。描述符统计量用Fisher向量编码,Fisher向量是一种信息丰富的中级描述符,随后使用神经网络来降低编码描述符的维数,提高编码描述符的可分辨性。提出的算法产生了一个高度精确的时空描述符,实现了非常低的计算成本,使我们的描述符部署在现实世界的监视和安全应用中。在具有挑战性的基准数据集上进行的实验表明,该方法的识别精度超过了目前最先进的动态纹理描述符。
{"title":"LBP-Flow and Hybrid Encoding for Real-Time Water and Fire Classification","authors":"Konstantinos Avgerinakis, Panagiotis Giannakeris, A. Briassouli, A. Karakostas, S. Vrochidis, Y. Kompatsiaris","doi":"10.1109/ICCVW.2017.56","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.56","url":null,"abstract":"The analysis of dynamic scenes in video is a very useful task especially for the detection and monitoring of natural hazards such as floods and fires. In this work, we focus on the challenging problem of real-world dynamic scene understanding, where videos contain dynamic textures that have been recorded in the \"wild\". These videos feature large illumination variations, complex motion, occlusions, camera motion, as well as significant intra-class differences, as the motion patterns of dynamic textures of the same category may be subject to large variations in real world recordings. We address these issues by introducing a novel dynamic texture descriptor, the \"Local Binary Pattern-flow\" (LBP-flow), which is shown to be able to accurately classify dynamic scenes whose complex motion patterns are difficult to separate using existing local descriptors, or which cannot be modelled by statistical techniques. LBP-flow builds upon existing Local Binary Pattern (LBP) descriptors by providing a low-cost representation of both appearance and optical flow textures, to increase its representation capabilities. The descriptor statistics are encoded with the Fisher vector, an informative mid-level descriptor, while a neural network follows to reduce the dimensionality and increase the discriminability of the encoded descriptor. The proposed algorithm leads to a highly accurate spatio-temporal descriptor which achieves a very low computational cost, enabling the deployment of our descriptor in real world surveillance and security applications. Experiments on challenging benchmark datasets demonstrate that it achieves recognition accuracy results that surpass State-of-the-Art dynamic texture descriptors.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123183593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Visual Music Transcription of Clarinet Video Recordings Trained with Audio-Based Labelled Data 用基于音频的标记数据训练的单簧管录像的视觉音乐转录
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.62
E. Gómez, P. Arias, Pablo Zinemanas, G. Haro
Automatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single view.
自动转录是音乐信息检索(MIR)领域中一个众所周知的任务,它包括从音频记录中计算一个符号音乐表示(例如MIDI)。在这项工作中,我们解决了当音频模态缺失或质量不足时视频记录的自动转录问题,从而分析了视觉信息。我们关注的是单簧管,它是通过打开/关闭一组孔和键来演奏的。我们提出了一种通过检测演奏者的指尖并测量其相对于单簧管孔和键的位移来自动视觉估计音符的方法。为此,我们跟踪单簧管并确定其在每一帧中的位置。指尖的相对位置被用作训练用于音符音高分类的机器学习算法的特征。为此,我们以一种半自动的方式建立了一个数据集,通过从现有的4.5小时的视频记录中估计音频信号的音高信息,这些视频记录来自9个不同的演奏者演奏的6首不同的歌曲。我们的研究结果证实了执行视觉和音频自动转录的困难,主要是由于运动模糊和闭塞,无法通过单一视图解决。
{"title":"Visual Music Transcription of Clarinet Video Recordings Trained with Audio-Based Labelled Data","authors":"E. Gómez, P. Arias, Pablo Zinemanas, G. Haro","doi":"10.1109/ICCVW.2017.62","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.62","url":null,"abstract":"Automatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single view.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126579325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Accurate System for Fashion Hand-Drawn Sketches Vectorization 一种精确的时装手绘草图矢量化系统
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.268
Luca Donati, Simone Cesano, A. Prati
Automatic vectorization of fashion hand-drawn sketches is a crucial task performed by fashion industries to speed up their workflows. Performing vectorization on hand-drawn sketches is not an easy task, and it requires a first crucial step that consists in extracting precise and thin lines from sketches that are potentially very diverse (depending on the tool used and on the designer capabilities and preferences). This paper proposes a system for automatic vectorization of fashion hand-drawn sketches based on Pearson's Correlation Coefficient with multiple Gaussian kernels in order to enhance and extract curvilinear structures in a sketch. The use of correlation grants invariance to image contrast and lighting, making the extracted lines more reliable for vectorization. Moreover, the proposed algorithm has been designed to equally extract both thin and wide lines with changing stroke hardness, which are common in fashion hand-drawn sketches. It also works for crossing lines, adjacent parallel lines and needs very few parameters (if any) to run. The efficacy of the proposal has been demonstrated on both hand-drawn sketches and images with added artificial noise, showing in both cases excellent performance w.r.t. the state of the art.
时装手绘草图的自动矢量化是时装行业加快工作流程的关键任务。对手绘草图执行矢量化并不是一件容易的事情,它需要第一个关键步骤,即从可能非常多样化的草图中提取精确而细的线条(取决于所使用的工具以及设计师的能力和偏好)。本文提出了一种基于多高斯核的皮尔逊相关系数的时装手绘草图自动矢量化系统,以增强和提取草图中的曲线结构。使用相关性可以保证图像对比度和光照的不变性,使提取的线条更可靠地进行矢量化。此外,该算法还设计为均匀提取时尚手绘草图中常见的随笔画硬度变化的细线和宽线。它也适用于交叉线,相邻平行线,并且需要很少的参数(如果有的话)来运行。该建议的有效性已经在手绘草图和添加了人工噪声的图像上得到了证明,在这两种情况下都显示出优异的性能。
{"title":"An Accurate System for Fashion Hand-Drawn Sketches Vectorization","authors":"Luca Donati, Simone Cesano, A. Prati","doi":"10.1109/ICCVW.2017.268","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.268","url":null,"abstract":"Automatic vectorization of fashion hand-drawn sketches is a crucial task performed by fashion industries to speed up their workflows. Performing vectorization on hand-drawn sketches is not an easy task, and it requires a first crucial step that consists in extracting precise and thin lines from sketches that are potentially very diverse (depending on the tool used and on the designer capabilities and preferences). This paper proposes a system for automatic vectorization of fashion hand-drawn sketches based on Pearson's Correlation Coefficient with multiple Gaussian kernels in order to enhance and extract curvilinear structures in a sketch. The use of correlation grants invariance to image contrast and lighting, making the extracted lines more reliable for vectorization. Moreover, the proposed algorithm has been designed to equally extract both thin and wide lines with changing stroke hardness, which are common in fashion hand-drawn sketches. It also works for crossing lines, adjacent parallel lines and needs very few parameters (if any) to run. The efficacy of the proposal has been demonstrated on both hand-drawn sketches and images with added artificial noise, showing in both cases excellent performance w.r.t. the state of the art.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Can We Speed up 3D Scanning? A Cognitive and Geometric Analysis 我们能加快3D扫描吗?认知与几何分析
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.317
Karthikeyan Vaiapury, B. Purushothaman, A. Pal, Swapna Agarwal
The paper propose a cognitive inspired change detection method for the detection and localization of shape variations on point clouds. A well defined pipeline is introduced by proposing a coarse to fine approach: i) shape segmentation, ii) fine segment registration using attention blocks. Shape segmentation is obtained using covariance based method and fine segment registration is carried out using gravitational registration algorithm. In particular the introduction of this partition-based approach using visual attention mechanism improves the speed of deformation detection and localization. Some results are shown on synthetic data of house and aircraft models. Experimental results shows that this simple yet effective approach designed with an eye to scalability can detect and localize the deformation in a faster manner. A real world car use case is also presented with some preliminary promising results useful for auditing and insurance claim tasks.
提出了一种基于认知启发的点云形状变化检测与定位方法。通过提出一种从粗到细的方法,引入了一个定义良好的管道:i)形状分割,ii)使用注意块进行精细段配准。采用基于协方差的方法进行形状分割,采用重力配准算法进行精细分割配准。特别是这种利用视觉注意机制的基于分割的方法的引入提高了变形检测和定位的速度。在房屋模型和飞机模型的综合数据上给出了一些结果。实验结果表明,考虑到可扩展性,这种简单而有效的方法可以更快地检测和定位变形。本文还介绍了一个真实世界的汽车用例,并给出了一些对审计和保险索赔任务有用的初步结果。
{"title":"Can We Speed up 3D Scanning? A Cognitive and Geometric Analysis","authors":"Karthikeyan Vaiapury, B. Purushothaman, A. Pal, Swapna Agarwal","doi":"10.1109/ICCVW.2017.317","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.317","url":null,"abstract":"The paper propose a cognitive inspired change detection method for the detection and localization of shape variations on point clouds. A well defined pipeline is introduced by proposing a coarse to fine approach: i) shape segmentation, ii) fine segment registration using attention blocks. Shape segmentation is obtained using covariance based method and fine segment registration is carried out using gravitational registration algorithm. In particular the introduction of this partition-based approach using visual attention mechanism improves the speed of deformation detection and localization. Some results are shown on synthetic data of house and aircraft models. Experimental results shows that this simple yet effective approach designed with an eye to scalability can detect and localize the deformation in a faster manner. A real world car use case is also presented with some preliminary promising results useful for auditing and insurance claim tasks.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123848713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Separation Between Projected Patterns for Multiple Projector 3D People Scanning 有效分离投影模式之间的多个投影仪3D人扫描
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.101
T. Petković, T. Pribanić, M. Donlic, P. Sturm
Structured light 3D surface scanners are usually comprised of one projector and of one camera which provide a limited view of the object's surface. Multiple projectors and cameras must be used to reconstruct the whole surface profile. Using multiple projectors in structured light profilometry is a challenging problem due to inter-projector interferences which make pattern separation difficult. We propose the use of sinusoidal fringe patterns where each projector has its own specifically chosen set of temporal phase shifts which together comprise a DFT2P+1 basis, where P is the number of projectors. Such a choice enables simple and efficient separation between projected patterns. The proposed method does not impose a limit on the number of projectors used and does not impose a limit on the projector placement. We demonstrate the applicability of the proposed method on three projectors and six cameras structured light system for human body scanning.
结构光3D表面扫描仪通常由一个投影仪和一个相机组成,提供物体表面的有限视图。必须使用多个投影仪和摄像机来重建整个表面轮廓。在结构光轮廓测量中使用多个投影仪是一个具有挑战性的问题,因为投影仪之间的干扰使图案分离变得困难。我们建议使用正弦条纹模式,其中每个投影仪都有自己特别选择的一组时间相移,它们共同组成DFT2P+1基,其中P是投影仪的数量。这样的选择使得在投影模式之间进行简单而有效的分离成为可能。所提议的方法不限制所使用的投影机的数量,也不限制投影机的放置。在三台投影仪和六台摄像机的人体扫描结构光系统上验证了该方法的适用性。
{"title":"Efficient Separation Between Projected Patterns for Multiple Projector 3D People Scanning","authors":"T. Petković, T. Pribanić, M. Donlic, P. Sturm","doi":"10.1109/ICCVW.2017.101","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.101","url":null,"abstract":"Structured light 3D surface scanners are usually comprised of one projector and of one camera which provide a limited view of the object's surface. Multiple projectors and cameras must be used to reconstruct the whole surface profile. Using multiple projectors in structured light profilometry is a challenging problem due to inter-projector interferences which make pattern separation difficult. We propose the use of sinusoidal fringe patterns where each projector has its own specifically chosen set of temporal phase shifts which together comprise a DFT2P+1 basis, where P is the number of projectors. Such a choice enables simple and efficient separation between projected patterns. The proposed method does not impose a limit on the number of projectors used and does not impose a limit on the projector placement. We demonstrate the applicability of the proposed method on three projectors and six cameras structured light system for human body scanning.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121363028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
View-Invariant Gait Representation Using Joint Bayesian Regularized Non-negative Matrix Factorization 基于联合贝叶斯正则化非负矩阵分解的视点不变步态表示
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.303
M. Babaee, G. Rigoll
Gait as a biometric feature has been investigated for human identification and biometric application. However, gait is highly dependent on the view angle. Therefore, the proposed gait features do not perform well when a person is changing his/her orientation towards camera. To tackle this problem, we propose a new method to learn low-dimensional view-invariant gait feature for person identification/verification. We model a gait observed by several different points of view as a Gaussian distribution and then utilize a function of Joint Bayesian as a regularizer coupled with the main objective function of non-negative matrix factorization to map gait features into a low-dimensional space. This process leads to an informative gait feature that can be used in a verification task. The performed experiments on a large gait dataset confirms the strength of the proposed method.
步态作为一种生物特征特征已被研究用于人体识别和生物识别应用。然而,步态高度依赖于视角。因此,当一个人改变他/她对着相机的方向时,所提出的步态特征表现不佳。为了解决这一问题,我们提出了一种新的低维视觉不变步态特征学习方法,用于人的识别/验证。我们将多个不同角度观察到的步态建模为高斯分布,然后利用联合贝叶斯函数作为正则器,结合非负矩阵分解的主要目标函数将步态特征映射到低维空间中。这个过程会产生一个信息丰富的步态特征,可以在验证任务中使用。在大型步态数据集上进行的实验验证了该方法的有效性。
{"title":"View-Invariant Gait Representation Using Joint Bayesian Regularized Non-negative Matrix Factorization","authors":"M. Babaee, G. Rigoll","doi":"10.1109/ICCVW.2017.303","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.303","url":null,"abstract":"Gait as a biometric feature has been investigated for human identification and biometric application. However, gait is highly dependent on the view angle. Therefore, the proposed gait features do not perform well when a person is changing his/her orientation towards camera. To tackle this problem, we propose a new method to learn low-dimensional view-invariant gait feature for person identification/verification. We model a gait observed by several different points of view as a Gaussian distribution and then utilize a function of Joint Bayesian as a regularizer coupled with the main objective function of non-negative matrix factorization to map gait features into a low-dimensional space. This process leads to an informative gait feature that can be used in a verification task. The performed experiments on a large gait dataset confirms the strength of the proposed method.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1