首页 > 最新文献

2014 22nd International Conference on Pattern Recognition最新文献

英文 中文
LBO-Shape Densities: Efficient 3D Shape Retrieval Using Wavelet Density Estimation lbo形状密度:利用小波密度估计有效的三维形状检索
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.19
Mark Moyou, Koffi Eddy Ihou, A. Peter
Driven by desirable attributes such as topological characterization and invariance to isometric transformations, the use of the Laplace-Beltrami operator (LBO) and its associated spectrum have been widely adopted among the shape analysis community. Here we demonstrate a novel use of the LBO for shape matching and retrieval by estimating probability densities on its Eigen space, and subsequently using the intrinsic geometry of the density manifold to categorize similar shapes. In our framework, each 3D shape's rich geometric structure, as captured by the low order eigenvectors of its LBO, is robustly characterized via a nonparametric density estimated directly on these eigenvectors. By utilizing a probabilistic model where the square root of the density is expanded in a wavelet basis, the space of LBO-shape densities is identifiable with the unit hyper sphere. We leverage this simple geometry for retrieval by computing an intrinsic Karcher mean (on the hyper sphere of LBO-shape densities) for each shape category, and use the closed-form distance between a query shape and the means to classify shapes. Our method alleviates the need for superfluous feature extraction schemes-required for popular bag-of-features approaches-and experiments demonstrate it to be robust and competitive with the state-of-the-art in 3D shape retrieval algorithms.
由于具有拓扑特征和等距变换的不变性等特性,拉普拉斯-贝尔特拉米算子及其相关谱在形状分析界得到了广泛的应用。在这里,我们展示了一种新的LBO用于形状匹配和检索,通过估计其特征空间上的概率密度,然后使用密度流形的固有几何对相似形状进行分类。在我们的框架中,每个3D形状的丰富几何结构,由其LBO的低阶特征向量捕获,通过直接在这些特征向量上估计的非参数密度进行鲁棒表征。利用密度的平方根在小波基上展开的概率模型,用单位超球来识别lbo形状的密度空间。我们通过计算每个形状类别的内在Karcher平均值(在lbo形状密度的超球上)来利用这个简单的几何结构进行检索,并使用查询形状和平均值之间的封闭形式距离来对形状进行分类。我们的方法减轻了对多余的特征提取方案的需求-需要流行的特征袋方法-并且实验证明它是鲁棒的,并且与最先进的3D形状检索算法相竞争。
{"title":"LBO-Shape Densities: Efficient 3D Shape Retrieval Using Wavelet Density Estimation","authors":"Mark Moyou, Koffi Eddy Ihou, A. Peter","doi":"10.1109/ICPR.2014.19","DOIUrl":"https://doi.org/10.1109/ICPR.2014.19","url":null,"abstract":"Driven by desirable attributes such as topological characterization and invariance to isometric transformations, the use of the Laplace-Beltrami operator (LBO) and its associated spectrum have been widely adopted among the shape analysis community. Here we demonstrate a novel use of the LBO for shape matching and retrieval by estimating probability densities on its Eigen space, and subsequently using the intrinsic geometry of the density manifold to categorize similar shapes. In our framework, each 3D shape's rich geometric structure, as captured by the low order eigenvectors of its LBO, is robustly characterized via a nonparametric density estimated directly on these eigenvectors. By utilizing a probabilistic model where the square root of the density is expanded in a wavelet basis, the space of LBO-shape densities is identifiable with the unit hyper sphere. We leverage this simple geometry for retrieval by computing an intrinsic Karcher mean (on the hyper sphere of LBO-shape densities) for each shape category, and use the closed-form distance between a query shape and the means to classify shapes. Our method alleviates the need for superfluous feature extraction schemes-required for popular bag-of-features approaches-and experiments demonstrate it to be robust and competitive with the state-of-the-art in 3D shape retrieval algorithms.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122769169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data Sufficiency for Online Writer Identification: A Comparative Study of Writer-Style Space vs. Feature Space Models 网络作者识别的数据充分性:作者风格空间与特征空间模型的比较研究
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.538
Arti Shivram, Chetan Ramaiah, V. Govindaraju
A key factor in building effective writer identification/verification systems is the amount of data required to build the underlying models. In this research we systematically examine data sufficiency bounds for two broad approaches to online writer identification -- feature space models vs. writer-style space models. We report results from 40 experiments conducted on two publicly available datasets and also test identification performance for the target models using two different feature functions. Our findings show that the writer-style space model gives higher identification performance for a given level of data and further, achieves high performance levels with lesser data costs. This model appears to require as less as 20 words per page to achieve identification performance close to 80% and reaches more than 90% accuracy with higher levels of data enrollment.
构建有效的作者识别/验证系统的一个关键因素是构建底层模型所需的数据量。在这项研究中,我们系统地检查了两种广泛的在线作家识别方法的数据充分性界限——特征空间模型与作家风格空间模型。我们报告了在两个公开可用的数据集上进行的40个实验的结果,并使用两个不同的特征函数测试了目标模型的识别性能。我们的研究结果表明,作者风格的空间模型为给定的数据级别提供了更高的识别性能,并且进一步以更少的数据成本实现了高性能级别。这个模型似乎只需要每页不到20个单词就能实现接近80%的识别性能,并且在更高水平的数据登记下达到90%以上的准确率。
{"title":"Data Sufficiency for Online Writer Identification: A Comparative Study of Writer-Style Space vs. Feature Space Models","authors":"Arti Shivram, Chetan Ramaiah, V. Govindaraju","doi":"10.1109/ICPR.2014.538","DOIUrl":"https://doi.org/10.1109/ICPR.2014.538","url":null,"abstract":"A key factor in building effective writer identification/verification systems is the amount of data required to build the underlying models. In this research we systematically examine data sufficiency bounds for two broad approaches to online writer identification -- feature space models vs. writer-style space models. We report results from 40 experiments conducted on two publicly available datasets and also test identification performance for the target models using two different feature functions. Our findings show that the writer-style space model gives higher identification performance for a given level of data and further, achieves high performance levels with lesser data costs. This model appears to require as less as 20 words per page to achieve identification performance close to 80% and reaches more than 90% accuracy with higher levels of data enrollment.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122602367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Person Re-identification via Discriminative Accumulation of Local Features 基于局部特征判别积累的人物再识别
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.681
Tetsu Matsukawa, Takahiro Okabe, Yoichi Sato
Metric learning to learn a good distance metric for distinguishing different people while being insensitive to intra-person variations is widely applied to person re-identification. In previous works, local histograms are densely sampled to extract spatially localized information of each person image. The extracted local histograms are then concatenated into one vector that is used as an input of metric learning. However, the dimensionality of such a concatenated vector often becomes large while the number of training samples is limited. This leads to an over fitting problem. In this work, we argue that such a problem of over-fitting comes from that it is each local histogram dimension (e.g. color brightness bin) in the same position is treated separately to examine which part of the image is more discriminative. To solve this problem, we propose a method that analyzes discriminative image positions shared by different local histogram dimensions. A common weight map shared by different dimensions and a distance metric which emphasizes discriminative dimensions in the local histogram are jointly learned with a unified discriminative criterion. Our experiments using four different public datasets confirmed the effectiveness of the proposed method.
度量学习在对人的内部变化不敏感的情况下,学习一个好的距离度量来区分不同的人,被广泛应用于人的再识别。在以前的工作中,局部直方图密集采样,提取每个人图像的空间定位信息。然后将提取的局部直方图连接成一个矢量,作为度量学习的输入。然而,在训练样本数量有限的情况下,这种连接向量的维数往往会变得很大。这就导致了过拟合问题。在这项工作中,我们认为这种过拟合问题来自于它是在同一位置的每个局部直方图维度(例如颜色亮度bin)被单独处理,以检查图像的哪一部分更具判别性。为了解决这一问题,我们提出了一种分析不同局部直方图维数共享的判别图像位置的方法。通过统一的判别准则,共同学习不同维度共享的公共权重图和强调局部直方图中判别维度的距离度量。我们使用四个不同的公共数据集进行的实验证实了所提出方法的有效性。
{"title":"Person Re-identification via Discriminative Accumulation of Local Features","authors":"Tetsu Matsukawa, Takahiro Okabe, Yoichi Sato","doi":"10.1109/ICPR.2014.681","DOIUrl":"https://doi.org/10.1109/ICPR.2014.681","url":null,"abstract":"Metric learning to learn a good distance metric for distinguishing different people while being insensitive to intra-person variations is widely applied to person re-identification. In previous works, local histograms are densely sampled to extract spatially localized information of each person image. The extracted local histograms are then concatenated into one vector that is used as an input of metric learning. However, the dimensionality of such a concatenated vector often becomes large while the number of training samples is limited. This leads to an over fitting problem. In this work, we argue that such a problem of over-fitting comes from that it is each local histogram dimension (e.g. color brightness bin) in the same position is treated separately to examine which part of the image is more discriminative. To solve this problem, we propose a method that analyzes discriminative image positions shared by different local histogram dimensions. A common weight map shared by different dimensions and a distance metric which emphasizes discriminative dimensions in the local histogram are jointly learned with a unified discriminative criterion. Our experiments using four different public datasets confirmed the effectiveness of the proposed method.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122830724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition 基于音频情感识别的长期停止准则的韵律、频谱和语音质量特征选择
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.148
Markus Kächele, D. Zharkov, S. Meudt, F. Schwenker
Emotion recognition from speech is an important field of research in human-machine-interfaces, and has begun to influence everyday life by employment in different areas such as call centers or wearable companions in the form of smartphones. In the proposed classification architecture, different spectral, prosodic and the relatively novel voice quality features are extracted from the speech signals. These features are then used to represent long-term information of the speech, leading to utterance-wise suprasegmental features. The most promising of these features are selected using a forward-selection/backward-elimination algorithm with a novel long-term termination criterion for the selection. The overall system has been evaluated using recordings from the public Berlin emotion database. Utilizing the resulted features, a recognition rate of 88,97% has been achieved which surpasses the performance of humans on this database and is comparable to the state of the art performance on this dataset.
语音情感识别是人机界面的一个重要研究领域,已经开始影响人们的日常生活,比如呼叫中心或智能手机等可穿戴设备。在该分类体系中,从语音信号中提取不同的频谱特征、韵律特征和相对新颖的语音质量特征。然后,这些特征被用来表示语音的长期信息,从而产生与话语相关的超片段特征。使用前向选择/后向消除算法选择这些特征中最有希望的特征,并采用新的长期终止标准进行选择。整个系统已经使用柏林公共情绪数据库的记录进行了评估。利用所得到的特征,识别率达到了88,97%,超过了人类在该数据库上的表现,与该数据集上的最新表现相当。
{"title":"Prosodic, Spectral and Voice Quality Feature Selection Using a Long-Term Stopping Criterion for Audio-Based Emotion Recognition","authors":"Markus Kächele, D. Zharkov, S. Meudt, F. Schwenker","doi":"10.1109/ICPR.2014.148","DOIUrl":"https://doi.org/10.1109/ICPR.2014.148","url":null,"abstract":"Emotion recognition from speech is an important field of research in human-machine-interfaces, and has begun to influence everyday life by employment in different areas such as call centers or wearable companions in the form of smartphones. In the proposed classification architecture, different spectral, prosodic and the relatively novel voice quality features are extracted from the speech signals. These features are then used to represent long-term information of the speech, leading to utterance-wise suprasegmental features. The most promising of these features are selected using a forward-selection/backward-elimination algorithm with a novel long-term termination criterion for the selection. The overall system has been evaluated using recordings from the public Berlin emotion database. Utilizing the resulted features, a recognition rate of 88,97% has been achieved which surpasses the performance of humans on this database and is comparable to the state of the art performance on this dataset.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114580524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Spatially-Variant Area Openings for Reference-Driven Adaptive Contour Preserving Filtering 参考驱动自适应轮廓保持滤波的空间变化区域开口
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.189
G. Franchi, J. Angulo
Classical adaptive mathematical morphology is based on operators which locally adapt the structuring elements to the image properties. Connected morphological operators act on the level of the flat zones of an image, such that only flat zones are filtered out, and hence the object edges are preserved. Area opening (resp. area closing) is one of the most useful connected operators, which filters out the bright (resp. dark) regions. It intrinsically involves the adaptation of the shape of the structuring element parameterized by its area. In this paper, we introduce the notion of reference-driven adaptive area opening according to two spatially-variant paradigms. First, the parameter of area is locally adapted by the reference image. This approach is applied to processing intensity depth images where the depth image is used to adapt the scale-size processing. Second, a self-dual area opening, where the reference image determines if the area filter is an opening or a closing with respect to the relationship between the image and the reference. Its natural application domain are the video sequences.
经典的自适应数学形态学是基于算子的,算子局部地使结构元素适应图像的属性。连接的形态学算子作用于图像的平面区域的水平,使得只有平面区域被过滤掉,因此物体边缘被保留。区域开放(如:区域关闭(Area closing)是最有用的连接操作之一,它可以过滤掉明亮的信号。黑暗)地区。它本质上涉及到结构单元的形状的适应参数化的面积。本文根据两种空间变异范式,引入了参考驱动自适应区域开放的概念。首先,区域参数由参考图像局部自适应;该方法用于处理深度图像,其中深度图像用于适应尺度大小处理。第二,自对偶区域打开,其中参考图像根据图像和参考之间的关系确定区域滤波器是打开还是关闭。它的自然应用领域是视频序列。
{"title":"Spatially-Variant Area Openings for Reference-Driven Adaptive Contour Preserving Filtering","authors":"G. Franchi, J. Angulo","doi":"10.1109/ICPR.2014.189","DOIUrl":"https://doi.org/10.1109/ICPR.2014.189","url":null,"abstract":"Classical adaptive mathematical morphology is based on operators which locally adapt the structuring elements to the image properties. Connected morphological operators act on the level of the flat zones of an image, such that only flat zones are filtered out, and hence the object edges are preserved. Area opening (resp. area closing) is one of the most useful connected operators, which filters out the bright (resp. dark) regions. It intrinsically involves the adaptation of the shape of the structuring element parameterized by its area. In this paper, we introduce the notion of reference-driven adaptive area opening according to two spatially-variant paradigms. First, the parameter of area is locally adapted by the reference image. This approach is applied to processing intensity depth images where the depth image is used to adapt the scale-size processing. Second, a self-dual area opening, where the reference image determines if the area filter is an opening or a closing with respect to the relationship between the image and the reference. Its natural application domain are the video sequences.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Video Text Extraction Using the Fusion of Color Gradient and Log-Gabor Filter 基于颜色梯度和Log-Gabor滤波器融合的视频文本提取
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.506
Zhike Zhang, Weiqiang Wang, K. Lu
Video text which contains rich semantic information can be utilized for video indexing and summarization. However, compared with scanned documents, text recogniton for video text is still a challenging problem due to complex background. Segmenting text line into single characters before text extraction can achieve higher recognition accuracy, since background of single character is less complex compared with whole text line. Therefore, we first perform character segmentation, which can accurately locate the character gap in the text line. More specifically, we get a fusion map which fuses the results of color gradient and log-gabor filter. Then, candidate segmentation points are obtained by vertical projection analysis of the fusion map. We get segmentation points by finding minimum projection value of candidate points in a limited range. Finally, we get the binary image of the single character image by applying K-means clustering and combine their results to form binary image of the whole text line. The binary image is further refined by inward filling and the fusion map. The experimental results on a large amount of data show that the proposed method can contribute to better binarization result which leads to a higher character recognition rate of OCR engine.
视频文本包含丰富的语义信息,可用于视频索引和视频摘要。然而,与扫描文档相比,由于背景复杂,视频文本的文本识别仍然是一个具有挑战性的问题。在提取文本之前,将文本行分割成单个字符可以获得更高的识别精度,因为单个字符的背景相对于整行文本来说不那么复杂。因此,我们首先进行字符分割,可以准确定位文本行中的字符间隙。具体地说,我们得到了一个融合了颜色梯度和log-gabor滤波器结果的融合图。然后对融合图进行垂直投影分析,得到候选分割点;我们通过寻找候选点在有限范围内的最小投影值得到分割点。最后,通过K-means聚类得到单个字符图像的二值图像,并将其结果组合成整行文本的二值图像。通过向内填充和融合图进一步细化二值图像。在大量数据上的实验结果表明,该方法可以获得较好的二值化结果,从而提高OCR引擎的字符识别率。
{"title":"Video Text Extraction Using the Fusion of Color Gradient and Log-Gabor Filter","authors":"Zhike Zhang, Weiqiang Wang, K. Lu","doi":"10.1109/ICPR.2014.506","DOIUrl":"https://doi.org/10.1109/ICPR.2014.506","url":null,"abstract":"Video text which contains rich semantic information can be utilized for video indexing and summarization. However, compared with scanned documents, text recogniton for video text is still a challenging problem due to complex background. Segmenting text line into single characters before text extraction can achieve higher recognition accuracy, since background of single character is less complex compared with whole text line. Therefore, we first perform character segmentation, which can accurately locate the character gap in the text line. More specifically, we get a fusion map which fuses the results of color gradient and log-gabor filter. Then, candidate segmentation points are obtained by vertical projection analysis of the fusion map. We get segmentation points by finding minimum projection value of candidate points in a limited range. Finally, we get the binary image of the single character image by applying K-means clustering and combine their results to form binary image of the whole text line. The binary image is further refined by inward filling and the fusion map. The experimental results on a large amount of data show that the proposed method can contribute to better binarization result which leads to a higher character recognition rate of OCR engine.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122132963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-shot Person Re-identification with Automatic Ambiguity Inference and Removal 基于自动模糊推理与去除的多镜头人物再识别
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.609
Chunchao Guo, Shi-Zhe Chen, J. Lai, Xiao-Jun Hu, Shi-Chang Shi
This work tackles the challenging problem of multi-shot person re-identification in realistic unconstrained scenarios. While most previous research within re-identification field is based on single-shot mode due to the constraint of scales of conventional datasets, multi-shot case provides a more natural way for person recognition in surveillance systems. Multiple frames can be easily captured in a camera network, thus more complementary information can be extracted for a more robust signature. To re-identify targets in real world, a key issue named identity ambiguity that commonly occurs must be solved preferentially, which is not considered by most previous studies. During the offline stage, we train an ambiguity classifier based on the shape context extracted from foreground responses in videos. Given a probe pedestrian, this paper employs the offline trained classifier to recognize and remove ambiguous samples, and then utilizes an improved hierarchical appearance representation to match humans between multiple-shots. Evaluations of this approach are conducted on two challenging real-world datasets, both of which are newly released in this paper, and yield impressive performance.
这项工作解决了现实无约束场景下多镜头人物再识别的挑战性问题。由于传统数据集尺度的限制,以往的再识别研究大多基于单镜头模式,而多镜头案例为监控系统中的人物识别提供了一种更为自然的方式。在摄像机网络中可以很容易地捕获多个帧,从而可以提取更多的互补信息以获得更鲁棒的签名。为了在现实世界中对目标进行再识别,必须优先解决一个经常发生的关键问题——身份模糊,而以往的研究大多没有考虑到这一点。在离线阶段,我们基于从视频前景响应中提取的形状上下文训练一个模糊分类器。给定探测行人,本文采用离线训练的分类器识别和去除模糊样本,然后利用改进的分层外观表示在多个镜头之间进行人的匹配。该方法在两个具有挑战性的真实世界数据集上进行了评估,这两个数据集都是本文最新发布的,并产生了令人印象深刻的性能。
{"title":"Multi-shot Person Re-identification with Automatic Ambiguity Inference and Removal","authors":"Chunchao Guo, Shi-Zhe Chen, J. Lai, Xiao-Jun Hu, Shi-Chang Shi","doi":"10.1109/ICPR.2014.609","DOIUrl":"https://doi.org/10.1109/ICPR.2014.609","url":null,"abstract":"This work tackles the challenging problem of multi-shot person re-identification in realistic unconstrained scenarios. While most previous research within re-identification field is based on single-shot mode due to the constraint of scales of conventional datasets, multi-shot case provides a more natural way for person recognition in surveillance systems. Multiple frames can be easily captured in a camera network, thus more complementary information can be extracted for a more robust signature. To re-identify targets in real world, a key issue named identity ambiguity that commonly occurs must be solved preferentially, which is not considered by most previous studies. During the offline stage, we train an ambiguity classifier based on the shape context extracted from foreground responses in videos. Given a probe pedestrian, this paper employs the offline trained classifier to recognize and remove ambiguous samples, and then utilizes an improved hierarchical appearance representation to match humans between multiple-shots. Evaluations of this approach are conducted on two challenging real-world datasets, both of which are newly released in this paper, and yield impressive performance.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122157355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Learning Semantic Binary Codes by Encoding Attributes for Image Retrieval 通过编码属性学习语义二进制码用于图像检索
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.57
Jianwei Luo, Zhi-guo Jiang
This paper addresses the problem of learning semantic compact binary codes for efficient retrieval in large-scale image collections. Our contributions are three-fold. Firstly, we introduce semantic codes, of which each bit corresponds to an attribute that describes a property of an object (e.g. dogs have furry). Secondly, we propose to use matrix factorization (MF) to learn the semantic codes by encoding attributes. Unlike traditional PCA-based encoding methods which quantize data into orthogonal bases, MF assumes no constraints on bases, and this scheme is coincided with that attributes are correlated. Finally, to augment semantic codes, MF is extended to encode extra non-semantic codes to preserve similarity in origin data space. Evaluations on a-Pascal dataset show that our method is comparable to the state-of-the-art when using Euclidean distance as ground truth, and even outperforms state-of-the-art when using class label as ground truth. Furthermore, in experiments, our method can retrieve images that share the same semantic properties with the query image, which can be used to other vision tasks, e.g. re-training classifiers.
本文研究了在大规模图像集合中学习语义紧凑二进制码以实现高效检索的问题。我们的贡献有三方面。首先,我们引入语义代码,其中每个比特对应一个描述对象属性的属性(例如狗有毛)。其次,我们提出使用矩阵分解(MF)方法,通过对属性进行编码来学习语义代码。与传统的基于pca的编码方法将数据量化为正交基不同,MF不假设基约束,且该方案符合属性相关的特点。最后,为了增强语义代码,将MF扩展到编码额外的非语义代码,以保持原始数据空间的相似性。对a-Pascal数据集的评估表明,当使用欧几里得距离作为基础真值时,我们的方法与最先进的方法相当,甚至在使用类标签作为基础真值时优于最先进的方法。此外,在实验中,我们的方法可以检索到与查询图像具有相同语义属性的图像,这些图像可以用于其他视觉任务,例如重新训练分类器。
{"title":"Learning Semantic Binary Codes by Encoding Attributes for Image Retrieval","authors":"Jianwei Luo, Zhi-guo Jiang","doi":"10.1109/ICPR.2014.57","DOIUrl":"https://doi.org/10.1109/ICPR.2014.57","url":null,"abstract":"This paper addresses the problem of learning semantic compact binary codes for efficient retrieval in large-scale image collections. Our contributions are three-fold. Firstly, we introduce semantic codes, of which each bit corresponds to an attribute that describes a property of an object (e.g. dogs have furry). Secondly, we propose to use matrix factorization (MF) to learn the semantic codes by encoding attributes. Unlike traditional PCA-based encoding methods which quantize data into orthogonal bases, MF assumes no constraints on bases, and this scheme is coincided with that attributes are correlated. Finally, to augment semantic codes, MF is extended to encode extra non-semantic codes to preserve similarity in origin data space. Evaluations on a-Pascal dataset show that our method is comparable to the state-of-the-art when using Euclidean distance as ground truth, and even outperforms state-of-the-art when using class label as ground truth. Furthermore, in experiments, our method can retrieve images that share the same semantic properties with the query image, which can be used to other vision tasks, e.g. re-training classifiers.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122220577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gait Recognition Using Flow Histogram Energy Image 基于流直方图能量图像的步态识别
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.85
Yazhou Yang, D. Tu, Guohui Li
Human gait is of essential importance for its wide use in biometric person-identification applications. In this work, we introduce a novel spatio-temporal gait representation, Flow Histogram Energy Image (FHEI), to characterize distinctive motion information of individual gait. We first extract the Histograms of Optical Flow (HOF) descriptors of each silhouette image of gait sequence, and construct an FHEI by averaging all the HOF features of a full gait cycle. We also propose a novel approach to generate two different synthetic gait templates. Real and synthetic gait templates are then fused to enhance the recognition accuracy of FHEI. We also adopt the Non-negative Matrix Factorization (NMF) to learn a part-based representation of FHEI templates. Extensive experiments conducted on the USF HumanID gait database indicate that the proposed FHEI approach achieves superior or comparable performance in comparison with a number of competitive gait recognition algorithms.
人体步态在生物识别领域的广泛应用至关重要。在这项工作中,我们引入了一种新的时空步态表示——流直方图能量图像(FHEI),以表征个体步态的独特运动信息。首先提取步态序列剪影图像的光流直方图(HOF)描述符,并对整个步态周期的所有HOF特征进行平均,构建光流直方图。我们还提出了一种新的方法来生成两种不同的合成步态模板。然后融合真实和合成的步态模板,提高FHEI的识别精度。我们还采用非负矩阵分解(NMF)来学习基于零件的FHEI模板表示。在USF HumanID步态数据库上进行的大量实验表明,与许多竞争对手的步态识别算法相比,所提出的FHEI方法具有优越或相当的性能。
{"title":"Gait Recognition Using Flow Histogram Energy Image","authors":"Yazhou Yang, D. Tu, Guohui Li","doi":"10.1109/ICPR.2014.85","DOIUrl":"https://doi.org/10.1109/ICPR.2014.85","url":null,"abstract":"Human gait is of essential importance for its wide use in biometric person-identification applications. In this work, we introduce a novel spatio-temporal gait representation, Flow Histogram Energy Image (FHEI), to characterize distinctive motion information of individual gait. We first extract the Histograms of Optical Flow (HOF) descriptors of each silhouette image of gait sequence, and construct an FHEI by averaging all the HOF features of a full gait cycle. We also propose a novel approach to generate two different synthetic gait templates. Real and synthetic gait templates are then fused to enhance the recognition accuracy of FHEI. We also adopt the Non-negative Matrix Factorization (NMF) to learn a part-based representation of FHEI templates. Extensive experiments conducted on the USF HumanID gait database indicate that the proposed FHEI approach achieves superior or comparable performance in comparison with a number of competitive gait recognition algorithms.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"160 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A Low Dimensionality Expression Robust Rejector for 3D Face Recognition 一种用于三维人脸识别的低维表达式鲁棒拒绝器
Pub Date : 2014-08-24 DOI: 10.1109/ICPR.2014.96
Jiangning Gao, Mehryar Emambakhsh, A. Evans
In the past decade, expression variations have been one of the most challenging sources of variability in 3D face recognition, especially for scenarios where there are a large number of face samples to discriminate between. In this paper, an expression robust reject or is proposed that first robustly locates landmarks on the relatively stable structure of the nose and its environs, termed the cheek/nose region. Then, by defining curves connecting the landmarks, a small set of features (4 curves with only 15 points each) on the cheek/nose surface are selected using the Bosphorus database. The resulting reject or, which can quickly eliminate a large number of candidates at an early stage, is further evaluated on the FRGC database for both the identification and verification scenarios. The classification performance using only 60 points from 4 curves shows the effectiveness of this efficient expression robust rejector.
在过去的十年中,表情变化一直是3D人脸识别中最具挑战性的变异性来源之一,特别是在有大量人脸样本需要区分的情况下。本文提出了一种表达鲁棒拒绝算法,该算法首先鲁棒地定位相对稳定的鼻子及其周围结构(称为脸颊/鼻子区域)上的标志。然后,通过定义连接地标的曲线,使用博斯普鲁斯数据库选择脸颊/鼻子表面的一小组特征(4条曲线,每条曲线只有15个点)。在FRGC数据库中进一步对识别和验证场景进行评估,从而在早期可以快速消除大量候选对象。仅使用4条曲线的60个点的分类性能表明了该高效表达式鲁棒拒绝器的有效性。
{"title":"A Low Dimensionality Expression Robust Rejector for 3D Face Recognition","authors":"Jiangning Gao, Mehryar Emambakhsh, A. Evans","doi":"10.1109/ICPR.2014.96","DOIUrl":"https://doi.org/10.1109/ICPR.2014.96","url":null,"abstract":"In the past decade, expression variations have been one of the most challenging sources of variability in 3D face recognition, especially for scenarios where there are a large number of face samples to discriminate between. In this paper, an expression robust reject or is proposed that first robustly locates landmarks on the relatively stable structure of the nose and its environs, termed the cheek/nose region. Then, by defining curves connecting the landmarks, a small set of features (4 curves with only 15 points each) on the cheek/nose surface are selected using the Bosphorus database. The resulting reject or, which can quickly eliminate a large number of candidates at an early stage, is further evaluated on the FRGC database for both the identification and verification scenarios. The classification performance using only 60 points from 4 curves shows the effectiveness of this efficient expression robust rejector.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"145 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129633005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2014 22nd International Conference on Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1