首页 > 最新文献

2015 IEEE International Conference on Multimedia and Expo (ICME)最新文献

英文 中文
Vectors of locally aggregated centers for compact video representation 用于紧凑视频表示的局部聚合中心向量
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177501
Alhabib Abbas, N. Deligiannis, Y. Andreopoulos
We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.
我们提出了一种新的矢量聚合技术用于紧凑视频表示,并应用于大型视频数据集的精确相似性检测。当前最先进的视觉搜索是由Jegou等人的局部聚合描述符向量(VLAD)形成的。VLAD基于尺度不变特征变换(SIFT)向量(每帧提取)和在训练集上计算的局部特征中心生成紧凑的视频表示。为了提高对视觉扭曲的鲁棒性,我们提出了一种新的方法,在特征表示中在更粗的层次上操作。我们首先对SIFT特征进行聚类,获得局部特征中心(lfc),然后根据从训练集中提取的局部特征中心(clfc)的给定中心对后者进行编码,从而创建局部聚集中心(vlic)向量。lfc和clfc之间的差异和被聚合以生成用于精确视频片段相似性检测的极其紧凑的视频描述。使用包含来自开放视频项目的1000多分钟内容的视频数据集进行的实验表明,在相同的压缩因子和相同的失真集下,VLAD在相对于VLAD和Douze等人的超池化方法的平均平均精度(mAP)方面获得了可观的收益。
{"title":"Vectors of locally aggregated centers for compact video representation","authors":"Alhabib Abbas, N. Deligiannis, Y. Andreopoulos","doi":"10.1109/ICME.2015.7177501","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177501","url":null,"abstract":"We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121984016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Inter-frame dependent rate-distortion optimization using lagrangian multiplier adaption 基于拉格朗日乘子自适应的帧间相关率失真优化
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177467
Shuai Li, Ce Zhu, Yanbo Gao, Yimin Zhou, F. Dufaux, Ming-Ting Sun
It is known that, in the current hybrid video coding structure, spatial and temporal prediction techniques are extensively used which introduce strong dependency among coding units. Such dependency poses a great challenge to perform a global rate-distortion optimization (RDO) when encoding a video sequence. RDO is usually performed in a way that coding efficiency of each coding unit is optimized independently without considering dependeny among coding units, leading to a suboptimal coding result for the whole sequence. In this paper, we investigate the inter-frame dependent RDO, where the impact of coding performance of the current coding unit on that of the following frames is considered. Accordingly, an inter-frame dependent rate-distortion optimization scheme is proposed and implemented on the newest video coding standard High Efficiency Video Coding (HEVC) platform. Experimental results show that the proposed scheme can achieve about 3.19% BD-rate saving in average over the state-of-the-art HEVC codec (HM15.0) in the low-delay B coding structure, with no extra encoding time. It obtains a significantly higher coding gain than the multiple QP (±3) optimization technique which would greatly increase the encoding time by a factor of about 6. Coupled with the multiple QP optimization, the proposed scheme can further achieve a higher BD-rate saving of 5.57% and 4.07% in average than the HEVC codec and the multiple QP optimization enabled HEVC codec, respectively.
众所周知,在目前的混合视频编码结构中,空间和时间预测技术被广泛使用,这些技术引入了编码单元之间的强依赖性。这种依赖性对编码视频序列时进行全局率失真优化(RDO)提出了很大的挑战。RDO通常是单独优化每个编码单元的编码效率,而不考虑编码单元之间的依赖性,导致整个序列的编码结果不是最优的。在本文中,我们研究了帧间依赖的RDO,其中考虑了当前编码单元的编码性能对后续帧的编码性能的影响。据此,提出了一种帧间相关的率失真优化方案,并在最新视频编码标准HEVC平台上实现。实验结果表明,在低延迟B编码结构下,该方案比目前最先进的HEVC编解码器(HM15.0)平均节省约3.19%的bd速率,且不需要额外的编码时间。它获得的编码增益明显高于多QP(±3)优化技术,而多QP(±3)优化技术将编码时间大大增加约6倍。结合多QP优化,该方案比HEVC编解码器和支持多QP优化的HEVC编解码器分别平均节省5.57%和4.07%的bd速率。
{"title":"Inter-frame dependent rate-distortion optimization using lagrangian multiplier adaption","authors":"Shuai Li, Ce Zhu, Yanbo Gao, Yimin Zhou, F. Dufaux, Ming-Ting Sun","doi":"10.1109/ICME.2015.7177467","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177467","url":null,"abstract":"It is known that, in the current hybrid video coding structure, spatial and temporal prediction techniques are extensively used which introduce strong dependency among coding units. Such dependency poses a great challenge to perform a global rate-distortion optimization (RDO) when encoding a video sequence. RDO is usually performed in a way that coding efficiency of each coding unit is optimized independently without considering dependeny among coding units, leading to a suboptimal coding result for the whole sequence. In this paper, we investigate the inter-frame dependent RDO, where the impact of coding performance of the current coding unit on that of the following frames is considered. Accordingly, an inter-frame dependent rate-distortion optimization scheme is proposed and implemented on the newest video coding standard High Efficiency Video Coding (HEVC) platform. Experimental results show that the proposed scheme can achieve about 3.19% BD-rate saving in average over the state-of-the-art HEVC codec (HM15.0) in the low-delay B coding structure, with no extra encoding time. It obtains a significantly higher coding gain than the multiple QP (±3) optimization technique which would greatly increase the encoding time by a factor of about 6. Coupled with the multiple QP optimization, the proposed scheme can further achieve a higher BD-rate saving of 5.57% and 4.07% in average than the HEVC codec and the multiple QP optimization enabled HEVC codec, respectively.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124814568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SegBOMP: An efficient algorithm for block non-sparse signal recovery SegBOMP:一种有效的块非稀疏信号恢复算法
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177503
Xushan Chen, Xiongwei Zhang, Jibin Yang, Meng Sun, Li Zeng
Block sparse signal recovery methods have attracted great interests which take the block structure of the nonzero coefficients into account when clustering. Compared with traditional compressive sensing methods, it can obtain better recovery performance with fewer measurements by utilizing the block-sparsity explicitly. In this paper we propose a segmented-version of the block orthogonal matching pursuit algorithm in which it divides any vector into several sparse sub-vectors. By doing this, the original method can be significantly accelerated due to the dimension reduction of measurements for each segmented vector. Experimental results showed that with low complexity the proposed method yielded identical or even better reconstruction performance than the conventional methods which treated the signal in the standard block-sparsity fashion. Furthermore, in the specific case, where not all segments contain nonzero blocks, the performance improvement can be interpreted as a gain in “effective SNR” in noisy environment.
块稀疏信号恢复方法在聚类时考虑了非零系数的块结构,引起了人们的广泛关注。与传统的压缩感知方法相比,该方法通过显式地利用块稀疏性,以较少的测量量获得更好的恢复性能。本文提出了一种分段版的块正交匹配追踪算法,该算法将任意向量划分为若干稀疏子向量。通过这样做,由于每个分割向量的测量维数减少,原始方法可以显着加速。实验结果表明,该方法在较低的复杂度下获得了与传统方法相同甚至更好的重构性能,传统方法采用标准的块稀疏方式处理信号。此外,在并非所有段都包含非零块的特定情况下,性能改进可以解释为噪声环境下“有效信噪比”的增加。
{"title":"SegBOMP: An efficient algorithm for block non-sparse signal recovery","authors":"Xushan Chen, Xiongwei Zhang, Jibin Yang, Meng Sun, Li Zeng","doi":"10.1109/ICME.2015.7177503","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177503","url":null,"abstract":"Block sparse signal recovery methods have attracted great interests which take the block structure of the nonzero coefficients into account when clustering. Compared with traditional compressive sensing methods, it can obtain better recovery performance with fewer measurements by utilizing the block-sparsity explicitly. In this paper we propose a segmented-version of the block orthogonal matching pursuit algorithm in which it divides any vector into several sparse sub-vectors. By doing this, the original method can be significantly accelerated due to the dimension reduction of measurements for each segmented vector. Experimental results showed that with low complexity the proposed method yielded identical or even better reconstruction performance than the conventional methods which treated the signal in the standard block-sparsity fashion. Furthermore, in the specific case, where not all segments contain nonzero blocks, the performance improvement can be interpreted as a gain in “effective SNR” in noisy environment.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130269838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Gaussian mixture model for saliency detection on face images 学习高斯混合模型在人脸图像显著性检测中的应用
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177465
Yun Ren, Mai Xu, Ruihan Pan, Zulin Wang
The previous work has demonstrated that integrating top-down features in bottom-up saliency methods can improve the saliency prediction accuracy. Therefore, for face images, this paper proposes a saliency detection method based on Gaussian mixture model (GMM), which learns the distribution of saliency over face regions as the top-down feature. Specifically, we verify that fixations tend to cluster around facial features, when viewing images with large faces. Thus, the GMM is learnt from fixations of eye tracking data, for establishing the distribution of saliency in faces. Then, in our method, the top-down feature upon the the learnt GMM is combined with the conventional bottom-up features (i.e., color, intensity, and orientation), for saliency detection. Finally, experimental results validate that our method is capable of improving the accuracy of saliency prediction for face images.
以往的研究表明,将自顶向下特征整合到自底向上显著性方法中可以提高显著性预测的精度。因此,针对人脸图像,本文提出了一种基于高斯混合模型(GMM)的显著性检测方法,该方法将显著性在人脸区域的分布作为自上而下的特征进行学习。具体来说,我们证实,当观看大脸图像时,注视倾向于集中在面部特征周围。因此,GMM是从眼动追踪数据的固定中学习的,用于建立面部显著性的分布。然后,在我们的方法中,将学习到的GMM上的自顶向下特征与传统的自底向上特征(即颜色、强度和方向)相结合,进行显著性检测。最后,实验结果验证了该方法能够提高人脸图像显著性预测的准确性。
{"title":"Learning Gaussian mixture model for saliency detection on face images","authors":"Yun Ren, Mai Xu, Ruihan Pan, Zulin Wang","doi":"10.1109/ICME.2015.7177465","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177465","url":null,"abstract":"The previous work has demonstrated that integrating top-down features in bottom-up saliency methods can improve the saliency prediction accuracy. Therefore, for face images, this paper proposes a saliency detection method based on Gaussian mixture model (GMM), which learns the distribution of saliency over face regions as the top-down feature. Specifically, we verify that fixations tend to cluster around facial features, when viewing images with large faces. Thus, the GMM is learnt from fixations of eye tracking data, for establishing the distribution of saliency in faces. Then, in our method, the top-down feature upon the the learnt GMM is combined with the conventional bottom-up features (i.e., color, intensity, and orientation), for saliency detection. Finally, experimental results validate that our method is capable of improving the accuracy of saliency prediction for face images.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What is the next step of binary features? 二进制特征的下一步是什么?
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177429
Zhendong Mao, Lei Zhang, Bin Wang, Li Guo
Various binary features have been recently proposed in literature, aiming at improving the computational efficiency and storage efficiency of image retrieval applications. However, the most common way of using binary features is voting strategy based on brute-force matching, since binary features are discrete data points distributed in Hamming space, so that models based on clustering such as BoW are unsuitable for them. Although indexing mechanism substantially decreases the time cost, the brute-force matching strategy becomes a bottleneck that restricts the performance of binary features. To address this issue, we propose a simple but effective method, namely COIP (Coding by Order-independent Projection), which projects binary features into a binary code of limited bits. As a result, each image is represented by one single binary code that can be indexed for computational and storage efficiency. We prove that the similarity between the COIP codes of two images with probability proportional to the ratio of their matched features. A comprehensive evaluation with several state-of-the-art binary features is performed on benchmark dataset. Experimental results reveal that for binary feature based image retrieval, our approach improves the storage/time efficiency by one/two orders of magnitude, while the retrieval performance remains almost unchanged.
为了提高图像检索应用的计算效率和存储效率,近年来文献中提出了多种二值特征。然而,最常见的使用二元特征的方法是基于暴力匹配的投票策略,由于二元特征是分布在Hamming空间中的离散数据点,因此基于聚类的模型(如BoW)不适合它们。虽然索引机制大大降低了时间成本,但暴力匹配策略成为制约二进制特征性能的瓶颈。为了解决这个问题,我们提出了一种简单而有效的方法,即COIP (Coding by Order-independent Projection),它将二进制特征投影到有限位的二进制代码中。因此,每个图像都由一个单独的二进制代码表示,可以为计算和存储效率索引。我们证明了两幅图像的COIP码之间的相似性与它们匹配特征的比例成概率正比。在基准数据集上对几种最先进的二进制特征进行了综合评估。实验结果表明,对于基于二值特征的图像检索,我们的方法在检索性能基本不变的情况下,将存储/时间效率提高了1 / 2个数量级。
{"title":"What is the next step of binary features?","authors":"Zhendong Mao, Lei Zhang, Bin Wang, Li Guo","doi":"10.1109/ICME.2015.7177429","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177429","url":null,"abstract":"Various binary features have been recently proposed in literature, aiming at improving the computational efficiency and storage efficiency of image retrieval applications. However, the most common way of using binary features is voting strategy based on brute-force matching, since binary features are discrete data points distributed in Hamming space, so that models based on clustering such as BoW are unsuitable for them. Although indexing mechanism substantially decreases the time cost, the brute-force matching strategy becomes a bottleneck that restricts the performance of binary features. To address this issue, we propose a simple but effective method, namely COIP (Coding by Order-independent Projection), which projects binary features into a binary code of limited bits. As a result, each image is represented by one single binary code that can be indexed for computational and storage efficiency. We prove that the similarity between the COIP codes of two images with probability proportional to the ratio of their matched features. A comprehensive evaluation with several state-of-the-art binary features is performed on benchmark dataset. Experimental results reveal that for binary feature based image retrieval, our approach improves the storage/time efficiency by one/two orders of magnitude, while the retrieval performance remains almost unchanged.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122068009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive detecting strategy against motion vector-based steganography 一种针对运动矢量隐写的自适应检测策略
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177410
Peipei Wang, Yun Cao, Xianfeng Zhao, Haibo Yu
The goal of this paper is to improve the performance of the current video steganalysis in detecting motion vector (MV)-based steganography. It is noticed that many MV-based approaches embed secret bits in content adaptive manners. Typically, the modifications are applied only to qualified MVs, which implies that the number of modified MVs varies among frames after embedding. On the other hand, nearly all the current steganalytic methods ignore such uneven distribution. They divide the video into frame groups equally and calculate every single feature vector using all MVs within one group. For better classification performances, we suggest performing steganalysis also in an adaptive way. First, divide the video into groups with variable lengths according to frame dynamics. Then within each group, calculate a single feature vector using all suspicious MVs (MVs that are likely to be modified). The experimental results have shown the effectiveness of our proposed strategy.
本文的目标是提高当前视频隐写分析中基于运动矢量(MV)检测的隐写性能。注意到许多基于mv的方法以内容自适应的方式嵌入了秘密位。通常,修改只应用于合格的mv,这意味着在嵌入后帧之间修改的mv的数量是不同的。另一方面,目前几乎所有的隐写分析方法都忽略了这种不均匀分布。他们将视频平均划分为帧组,并使用一组内的所有mv计算每个单个特征向量。为了获得更好的分类性能,我们建议采用自适应方式进行隐写分析。首先,根据帧动态将视频分成不同长度的组。然后在每一组中,使用所有可疑的mv(可能被修改的mv)计算单个特征向量。实验结果表明了该策略的有效性。
{"title":"An adaptive detecting strategy against motion vector-based steganography","authors":"Peipei Wang, Yun Cao, Xianfeng Zhao, Haibo Yu","doi":"10.1109/ICME.2015.7177410","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177410","url":null,"abstract":"The goal of this paper is to improve the performance of the current video steganalysis in detecting motion vector (MV)-based steganography. It is noticed that many MV-based approaches embed secret bits in content adaptive manners. Typically, the modifications are applied only to qualified MVs, which implies that the number of modified MVs varies among frames after embedding. On the other hand, nearly all the current steganalytic methods ignore such uneven distribution. They divide the video into frame groups equally and calculate every single feature vector using all MVs within one group. For better classification performances, we suggest performing steganalysis also in an adaptive way. First, divide the video into groups with variable lengths according to frame dynamics. Then within each group, calculate a single feature vector using all suspicious MVs (MVs that are likely to be modified). The experimental results have shown the effectiveness of our proposed strategy.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117066383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Temporal spotting of human actions from videos containing actor's unintentional motions 从包含演员无意动作的视频中发现人类动作的时间点
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177481
K. Hara, Kazuaki Nakamura, N. Babaguchi
This paper proposes a method for temporal action spotting: the temporal segmentation and classification of human actions in videos. Naturally performed human actions often involve actor's unintentional motions. These unintentional motions yield false visual evidences in the videos, which are not related to the performed actions and degrade the performance of temporal action spotting. To deal with this problem, our proposed method empolys a voting-based approach in which the temporal relation between each action and its visual evidence is probabilistically modeled as a voting score function. Due to the approach, our method can robustly spot the target actions even when the actions involve several unintentional motions, because the effect of the false visual evidences yielded by the unintentional motions can be canceled by other visual evidences observed with the target actions. Experimental results showed that the proposed method is highly robust to the unintentional motions.
本文提出了一种时间动作识别方法:对视频中的人类动作进行时间分割和分类。自然表演的人类动作往往包含演员的无意识动作。这些无意识的动作在视频中产生虚假的视觉证据,这些证据与所执行的动作无关,并降低了时间动作识别的性能。为了解决这个问题,我们提出的方法采用了一种基于投票的方法,其中每个动作与其视觉证据之间的时间关系被概率地建模为投票分数函数。由于该方法能够鲁棒地发现目标动作,即使该动作包含多个无意动作,因为无意动作产生的虚假视觉证据的效果可以被目标动作观察到的其他视觉证据所抵消。实验结果表明,该方法对非有意运动具有较强的鲁棒性。
{"title":"Temporal spotting of human actions from videos containing actor's unintentional motions","authors":"K. Hara, Kazuaki Nakamura, N. Babaguchi","doi":"10.1109/ICME.2015.7177481","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177481","url":null,"abstract":"This paper proposes a method for temporal action spotting: the temporal segmentation and classification of human actions in videos. Naturally performed human actions often involve actor's unintentional motions. These unintentional motions yield false visual evidences in the videos, which are not related to the performed actions and degrade the performance of temporal action spotting. To deal with this problem, our proposed method empolys a voting-based approach in which the temporal relation between each action and its visual evidence is probabilistically modeled as a voting score function. Due to the approach, our method can robustly spot the target actions even when the actions involve several unintentional motions, because the effect of the false visual evidences yielded by the unintentional motions can be canceled by other visual evidences observed with the target actions. Experimental results showed that the proposed method is highly robust to the unintentional motions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127093389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Characterizing dynamic textures with space-time lacunarity analysis 动态纹理的时空空洞分析
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177482
Yuping Sun, Yong Xu, Yuhui Quan
This paper addresses the challenge of reliably capturing the temporal characteristics of local space-time patterns in dynamic texture (DT). A powerful DT descriptor is proposed, which enjoys strong robustness to viewpoint changes, illumination changes, and video deformation. Observing that local DT patterns are spatial-temporally distributed with stationary irregularities, we proposed to characterize the distributions of local binarized DT patterns along both the temporal and the spatial axes via lacunarity analysis. We also observed such irregularities are similar on the DT slices along the same axis but distinct between axes. Thus, the resulting lacunarity based features are averaged along each axis and concatenated as the final DT descriptor. We applied the proposed DT descriptor to DT classification and evaluated its performance on several benchmark datasets. The experimental results have demonstrated the power of the proposed descriptor in comparison with existing ones.
本文解决了动态纹理(DT)中局部时空模式时间特征的可靠捕获问题。提出了一种强大的DT描述子,对视点变化、光照变化和视频变形具有较强的鲁棒性。观察到局部二值化的DT模式在时空上具有平稳的不规则性,我们提出通过空隙度分析来表征局部二值化的DT模式在时间轴和空间轴上的分布。我们还观察到这种不规则性在沿同一轴的DT切片上相似,但在轴线之间不同。因此,由此产生的基于缺乏性的特征沿每个轴平均并连接为最终的DT描述符。我们将提出的DT描述符应用于DT分类,并在几个基准数据集上评估了其性能。实验结果表明,与现有的描述符相比,所提出的描述符具有强大的功能。
{"title":"Characterizing dynamic textures with space-time lacunarity analysis","authors":"Yuping Sun, Yong Xu, Yuhui Quan","doi":"10.1109/ICME.2015.7177482","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177482","url":null,"abstract":"This paper addresses the challenge of reliably capturing the temporal characteristics of local space-time patterns in dynamic texture (DT). A powerful DT descriptor is proposed, which enjoys strong robustness to viewpoint changes, illumination changes, and video deformation. Observing that local DT patterns are spatial-temporally distributed with stationary irregularities, we proposed to characterize the distributions of local binarized DT patterns along both the temporal and the spatial axes via lacunarity analysis. We also observed such irregularities are similar on the DT slices along the same axis but distinct between axes. Thus, the resulting lacunarity based features are averaged along each axis and concatenated as the final DT descriptor. We applied the proposed DT descriptor to DT classification and evaluated its performance on several benchmark datasets. The experimental results have demonstrated the power of the proposed descriptor in comparison with existing ones.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Cross-media hashing with Centroid Approaching 使用质心逼近的跨媒体散列
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177473
Ruoyu Liu, Yao Zhao, Shikui Wei, Zhenfeng Zhu
Cross-media retrieval has received increasing interest in recent years, which aims to addressing the semantic correlation issues within rich media. As two key aspects, cross-media representation and indexing have been studied for dealing with cross-media similarity measure and the scalability issue, respectively. In this paper, we propose a new cross-media hashing scheme, called Centroid Approaching Cross-Media Hashing (CAMH), to handle both cross-media representation and indexing simultaneously. Different from existing indexing methods, the proposed method introduces semantic category information into the learning procedure, leading to more exact hash codes of multiple media type instances. In addition, we present a comparative study of cross-media indexing methods under a unique evaluation framework. Extensive experiments on two commonly used datasets demonstrate the good performance in terms of search accuracy and time complexity.
跨媒体检索近年来受到越来越多的关注,其目的是解决富媒体中的语义关联问题。跨媒体表示和索引是处理跨媒体相似性度量和可扩展性问题的两个关键方面。在本文中,我们提出了一种新的跨媒体哈希方案,称为质心逼近跨媒体哈希(CAMH),以同时处理跨媒体表示和索引。与现有的索引方法不同,该方法将语义分类信息引入到学习过程中,从而得到更精确的多媒体类型实例哈希码。此外,我们还在一个独特的评价框架下对跨媒体索引方法进行了比较研究。在两个常用数据集上的大量实验表明,该算法在搜索精度和时间复杂度方面具有良好的性能。
{"title":"Cross-media hashing with Centroid Approaching","authors":"Ruoyu Liu, Yao Zhao, Shikui Wei, Zhenfeng Zhu","doi":"10.1109/ICME.2015.7177473","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177473","url":null,"abstract":"Cross-media retrieval has received increasing interest in recent years, which aims to addressing the semantic correlation issues within rich media. As two key aspects, cross-media representation and indexing have been studied for dealing with cross-media similarity measure and the scalability issue, respectively. In this paper, we propose a new cross-media hashing scheme, called Centroid Approaching Cross-Media Hashing (CAMH), to handle both cross-media representation and indexing simultaneously. Different from existing indexing methods, the proposed method introduces semantic category information into the learning procedure, leading to more exact hash codes of multiple media type instances. In addition, we present a comparative study of cross-media indexing methods under a unique evaluation framework. Extensive experiments on two commonly used datasets demonstrate the good performance in terms of search accuracy and time complexity.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132579882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spatial perception reproduction of sound events based on sound property coincidences 基于声属性巧合的声音事件空间感知再现
Pub Date : 2015-08-06 DOI: 10.1109/ICME.2015.7177412
Maosheng Zhang, R. Hu, Shihong Chen, Xiaochen Wang, Dengshi Li, Lin Jiang
Sound pressure and particle velocity are used to reproduce sound signals in multichannel systems. The two sound properties were estimated step by step and particle velocity was scaled due to ill-conditioned equations in Ando's study. We explore a new system of equations to maintain both sound pressure and particle velocity. The weight equations are solved in a non-traditional way to figure out exact solutions. Based on the proposed method, the perception of the direction of a sound event and the distance to the listening point are both reproduced correctly in a three-dimension reproduction system. The comparison between the proposed method and Ando's method is outlined and the proposed method is more flexible and useful. Objective evaluation shows the wavefront in the proposed method is more accurate than Ando's method and subjective evaluation confirms that the proposed method improves the spatial perception of sound events.
在多声道系统中,声压和粒子速度被用来再现声音信号。在Ando的研究中,通过病态方程对两种声音性质进行了逐步估计,并对粒子速度进行了缩放。我们探索了一个新的方程组来保持声压和粒子速度。用一种非传统的方法来求解权重方程以求得精确解。基于所提出的方法,声音事件的方向感知和到听音点的距离在三维再现系统中都得到了正确的再现。本文对所提出的方法与安藤的方法进行了比较,结果表明所提出的方法更加灵活和实用。客观评价表明所提方法的波前比Ando方法更准确,主观评价证实所提方法提高了声事件的空间感知。
{"title":"Spatial perception reproduction of sound events based on sound property coincidences","authors":"Maosheng Zhang, R. Hu, Shihong Chen, Xiaochen Wang, Dengshi Li, Lin Jiang","doi":"10.1109/ICME.2015.7177412","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177412","url":null,"abstract":"Sound pressure and particle velocity are used to reproduce sound signals in multichannel systems. The two sound properties were estimated step by step and particle velocity was scaled due to ill-conditioned equations in Ando's study. We explore a new system of equations to maintain both sound pressure and particle velocity. The weight equations are solved in a non-traditional way to figure out exact solutions. Based on the proposed method, the perception of the direction of a sound event and the distance to the listening point are both reproduced correctly in a three-dimension reproduction system. The comparison between the proposed method and Ando's method is outlined and the proposed method is more flexible and useful. Objective evaluation shows the wavefront in the proposed method is more accurate than Ando's method and subjective evaluation confirms that the proposed method improves the spatial perception of sound events.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2015 IEEE International Conference on Multimedia and Expo (ICME)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1