Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177501
Alhabib Abbas, N. Deligiannis, Y. Andreopoulos
We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.
{"title":"Vectors of locally aggregated centers for compact video representation","authors":"Alhabib Abbas, N. Deligiannis, Y. Andreopoulos","doi":"10.1109/ICME.2015.7177501","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177501","url":null,"abstract":"We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121984016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177467
Shuai Li, Ce Zhu, Yanbo Gao, Yimin Zhou, F. Dufaux, Ming-Ting Sun
It is known that, in the current hybrid video coding structure, spatial and temporal prediction techniques are extensively used which introduce strong dependency among coding units. Such dependency poses a great challenge to perform a global rate-distortion optimization (RDO) when encoding a video sequence. RDO is usually performed in a way that coding efficiency of each coding unit is optimized independently without considering dependeny among coding units, leading to a suboptimal coding result for the whole sequence. In this paper, we investigate the inter-frame dependent RDO, where the impact of coding performance of the current coding unit on that of the following frames is considered. Accordingly, an inter-frame dependent rate-distortion optimization scheme is proposed and implemented on the newest video coding standard High Efficiency Video Coding (HEVC) platform. Experimental results show that the proposed scheme can achieve about 3.19% BD-rate saving in average over the state-of-the-art HEVC codec (HM15.0) in the low-delay B coding structure, with no extra encoding time. It obtains a significantly higher coding gain than the multiple QP (±3) optimization technique which would greatly increase the encoding time by a factor of about 6. Coupled with the multiple QP optimization, the proposed scheme can further achieve a higher BD-rate saving of 5.57% and 4.07% in average than the HEVC codec and the multiple QP optimization enabled HEVC codec, respectively.
{"title":"Inter-frame dependent rate-distortion optimization using lagrangian multiplier adaption","authors":"Shuai Li, Ce Zhu, Yanbo Gao, Yimin Zhou, F. Dufaux, Ming-Ting Sun","doi":"10.1109/ICME.2015.7177467","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177467","url":null,"abstract":"It is known that, in the current hybrid video coding structure, spatial and temporal prediction techniques are extensively used which introduce strong dependency among coding units. Such dependency poses a great challenge to perform a global rate-distortion optimization (RDO) when encoding a video sequence. RDO is usually performed in a way that coding efficiency of each coding unit is optimized independently without considering dependeny among coding units, leading to a suboptimal coding result for the whole sequence. In this paper, we investigate the inter-frame dependent RDO, where the impact of coding performance of the current coding unit on that of the following frames is considered. Accordingly, an inter-frame dependent rate-distortion optimization scheme is proposed and implemented on the newest video coding standard High Efficiency Video Coding (HEVC) platform. Experimental results show that the proposed scheme can achieve about 3.19% BD-rate saving in average over the state-of-the-art HEVC codec (HM15.0) in the low-delay B coding structure, with no extra encoding time. It obtains a significantly higher coding gain than the multiple QP (±3) optimization technique which would greatly increase the encoding time by a factor of about 6. Coupled with the multiple QP optimization, the proposed scheme can further achieve a higher BD-rate saving of 5.57% and 4.07% in average than the HEVC codec and the multiple QP optimization enabled HEVC codec, respectively.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124814568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177503
Xushan Chen, Xiongwei Zhang, Jibin Yang, Meng Sun, Li Zeng
Block sparse signal recovery methods have attracted great interests which take the block structure of the nonzero coefficients into account when clustering. Compared with traditional compressive sensing methods, it can obtain better recovery performance with fewer measurements by utilizing the block-sparsity explicitly. In this paper we propose a segmented-version of the block orthogonal matching pursuit algorithm in which it divides any vector into several sparse sub-vectors. By doing this, the original method can be significantly accelerated due to the dimension reduction of measurements for each segmented vector. Experimental results showed that with low complexity the proposed method yielded identical or even better reconstruction performance than the conventional methods which treated the signal in the standard block-sparsity fashion. Furthermore, in the specific case, where not all segments contain nonzero blocks, the performance improvement can be interpreted as a gain in “effective SNR” in noisy environment.
{"title":"SegBOMP: An efficient algorithm for block non-sparse signal recovery","authors":"Xushan Chen, Xiongwei Zhang, Jibin Yang, Meng Sun, Li Zeng","doi":"10.1109/ICME.2015.7177503","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177503","url":null,"abstract":"Block sparse signal recovery methods have attracted great interests which take the block structure of the nonzero coefficients into account when clustering. Compared with traditional compressive sensing methods, it can obtain better recovery performance with fewer measurements by utilizing the block-sparsity explicitly. In this paper we propose a segmented-version of the block orthogonal matching pursuit algorithm in which it divides any vector into several sparse sub-vectors. By doing this, the original method can be significantly accelerated due to the dimension reduction of measurements for each segmented vector. Experimental results showed that with low complexity the proposed method yielded identical or even better reconstruction performance than the conventional methods which treated the signal in the standard block-sparsity fashion. Furthermore, in the specific case, where not all segments contain nonzero blocks, the performance improvement can be interpreted as a gain in “effective SNR” in noisy environment.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130269838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177465
Yun Ren, Mai Xu, Ruihan Pan, Zulin Wang
The previous work has demonstrated that integrating top-down features in bottom-up saliency methods can improve the saliency prediction accuracy. Therefore, for face images, this paper proposes a saliency detection method based on Gaussian mixture model (GMM), which learns the distribution of saliency over face regions as the top-down feature. Specifically, we verify that fixations tend to cluster around facial features, when viewing images with large faces. Thus, the GMM is learnt from fixations of eye tracking data, for establishing the distribution of saliency in faces. Then, in our method, the top-down feature upon the the learnt GMM is combined with the conventional bottom-up features (i.e., color, intensity, and orientation), for saliency detection. Finally, experimental results validate that our method is capable of improving the accuracy of saliency prediction for face images.
{"title":"Learning Gaussian mixture model for saliency detection on face images","authors":"Yun Ren, Mai Xu, Ruihan Pan, Zulin Wang","doi":"10.1109/ICME.2015.7177465","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177465","url":null,"abstract":"The previous work has demonstrated that integrating top-down features in bottom-up saliency methods can improve the saliency prediction accuracy. Therefore, for face images, this paper proposes a saliency detection method based on Gaussian mixture model (GMM), which learns the distribution of saliency over face regions as the top-down feature. Specifically, we verify that fixations tend to cluster around facial features, when viewing images with large faces. Thus, the GMM is learnt from fixations of eye tracking data, for establishing the distribution of saliency in faces. Then, in our method, the top-down feature upon the the learnt GMM is combined with the conventional bottom-up features (i.e., color, intensity, and orientation), for saliency detection. Finally, experimental results validate that our method is capable of improving the accuracy of saliency prediction for face images.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177429
Zhendong Mao, Lei Zhang, Bin Wang, Li Guo
Various binary features have been recently proposed in literature, aiming at improving the computational efficiency and storage efficiency of image retrieval applications. However, the most common way of using binary features is voting strategy based on brute-force matching, since binary features are discrete data points distributed in Hamming space, so that models based on clustering such as BoW are unsuitable for them. Although indexing mechanism substantially decreases the time cost, the brute-force matching strategy becomes a bottleneck that restricts the performance of binary features. To address this issue, we propose a simple but effective method, namely COIP (Coding by Order-independent Projection), which projects binary features into a binary code of limited bits. As a result, each image is represented by one single binary code that can be indexed for computational and storage efficiency. We prove that the similarity between the COIP codes of two images with probability proportional to the ratio of their matched features. A comprehensive evaluation with several state-of-the-art binary features is performed on benchmark dataset. Experimental results reveal that for binary feature based image retrieval, our approach improves the storage/time efficiency by one/two orders of magnitude, while the retrieval performance remains almost unchanged.
为了提高图像检索应用的计算效率和存储效率,近年来文献中提出了多种二值特征。然而,最常见的使用二元特征的方法是基于暴力匹配的投票策略,由于二元特征是分布在Hamming空间中的离散数据点,因此基于聚类的模型(如BoW)不适合它们。虽然索引机制大大降低了时间成本,但暴力匹配策略成为制约二进制特征性能的瓶颈。为了解决这个问题,我们提出了一种简单而有效的方法,即COIP (Coding by Order-independent Projection),它将二进制特征投影到有限位的二进制代码中。因此,每个图像都由一个单独的二进制代码表示,可以为计算和存储效率索引。我们证明了两幅图像的COIP码之间的相似性与它们匹配特征的比例成概率正比。在基准数据集上对几种最先进的二进制特征进行了综合评估。实验结果表明,对于基于二值特征的图像检索,我们的方法在检索性能基本不变的情况下,将存储/时间效率提高了1 / 2个数量级。
{"title":"What is the next step of binary features?","authors":"Zhendong Mao, Lei Zhang, Bin Wang, Li Guo","doi":"10.1109/ICME.2015.7177429","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177429","url":null,"abstract":"Various binary features have been recently proposed in literature, aiming at improving the computational efficiency and storage efficiency of image retrieval applications. However, the most common way of using binary features is voting strategy based on brute-force matching, since binary features are discrete data points distributed in Hamming space, so that models based on clustering such as BoW are unsuitable for them. Although indexing mechanism substantially decreases the time cost, the brute-force matching strategy becomes a bottleneck that restricts the performance of binary features. To address this issue, we propose a simple but effective method, namely COIP (Coding by Order-independent Projection), which projects binary features into a binary code of limited bits. As a result, each image is represented by one single binary code that can be indexed for computational and storage efficiency. We prove that the similarity between the COIP codes of two images with probability proportional to the ratio of their matched features. A comprehensive evaluation with several state-of-the-art binary features is performed on benchmark dataset. Experimental results reveal that for binary feature based image retrieval, our approach improves the storage/time efficiency by one/two orders of magnitude, while the retrieval performance remains almost unchanged.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122068009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177410
Peipei Wang, Yun Cao, Xianfeng Zhao, Haibo Yu
The goal of this paper is to improve the performance of the current video steganalysis in detecting motion vector (MV)-based steganography. It is noticed that many MV-based approaches embed secret bits in content adaptive manners. Typically, the modifications are applied only to qualified MVs, which implies that the number of modified MVs varies among frames after embedding. On the other hand, nearly all the current steganalytic methods ignore such uneven distribution. They divide the video into frame groups equally and calculate every single feature vector using all MVs within one group. For better classification performances, we suggest performing steganalysis also in an adaptive way. First, divide the video into groups with variable lengths according to frame dynamics. Then within each group, calculate a single feature vector using all suspicious MVs (MVs that are likely to be modified). The experimental results have shown the effectiveness of our proposed strategy.
{"title":"An adaptive detecting strategy against motion vector-based steganography","authors":"Peipei Wang, Yun Cao, Xianfeng Zhao, Haibo Yu","doi":"10.1109/ICME.2015.7177410","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177410","url":null,"abstract":"The goal of this paper is to improve the performance of the current video steganalysis in detecting motion vector (MV)-based steganography. It is noticed that many MV-based approaches embed secret bits in content adaptive manners. Typically, the modifications are applied only to qualified MVs, which implies that the number of modified MVs varies among frames after embedding. On the other hand, nearly all the current steganalytic methods ignore such uneven distribution. They divide the video into frame groups equally and calculate every single feature vector using all MVs within one group. For better classification performances, we suggest performing steganalysis also in an adaptive way. First, divide the video into groups with variable lengths according to frame dynamics. Then within each group, calculate a single feature vector using all suspicious MVs (MVs that are likely to be modified). The experimental results have shown the effectiveness of our proposed strategy.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117066383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177481
K. Hara, Kazuaki Nakamura, N. Babaguchi
This paper proposes a method for temporal action spotting: the temporal segmentation and classification of human actions in videos. Naturally performed human actions often involve actor's unintentional motions. These unintentional motions yield false visual evidences in the videos, which are not related to the performed actions and degrade the performance of temporal action spotting. To deal with this problem, our proposed method empolys a voting-based approach in which the temporal relation between each action and its visual evidence is probabilistically modeled as a voting score function. Due to the approach, our method can robustly spot the target actions even when the actions involve several unintentional motions, because the effect of the false visual evidences yielded by the unintentional motions can be canceled by other visual evidences observed with the target actions. Experimental results showed that the proposed method is highly robust to the unintentional motions.
{"title":"Temporal spotting of human actions from videos containing actor's unintentional motions","authors":"K. Hara, Kazuaki Nakamura, N. Babaguchi","doi":"10.1109/ICME.2015.7177481","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177481","url":null,"abstract":"This paper proposes a method for temporal action spotting: the temporal segmentation and classification of human actions in videos. Naturally performed human actions often involve actor's unintentional motions. These unintentional motions yield false visual evidences in the videos, which are not related to the performed actions and degrade the performance of temporal action spotting. To deal with this problem, our proposed method empolys a voting-based approach in which the temporal relation between each action and its visual evidence is probabilistically modeled as a voting score function. Due to the approach, our method can robustly spot the target actions even when the actions involve several unintentional motions, because the effect of the false visual evidences yielded by the unintentional motions can be canceled by other visual evidences observed with the target actions. Experimental results showed that the proposed method is highly robust to the unintentional motions.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127093389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177482
Yuping Sun, Yong Xu, Yuhui Quan
This paper addresses the challenge of reliably capturing the temporal characteristics of local space-time patterns in dynamic texture (DT). A powerful DT descriptor is proposed, which enjoys strong robustness to viewpoint changes, illumination changes, and video deformation. Observing that local DT patterns are spatial-temporally distributed with stationary irregularities, we proposed to characterize the distributions of local binarized DT patterns along both the temporal and the spatial axes via lacunarity analysis. We also observed such irregularities are similar on the DT slices along the same axis but distinct between axes. Thus, the resulting lacunarity based features are averaged along each axis and concatenated as the final DT descriptor. We applied the proposed DT descriptor to DT classification and evaluated its performance on several benchmark datasets. The experimental results have demonstrated the power of the proposed descriptor in comparison with existing ones.
{"title":"Characterizing dynamic textures with space-time lacunarity analysis","authors":"Yuping Sun, Yong Xu, Yuhui Quan","doi":"10.1109/ICME.2015.7177482","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177482","url":null,"abstract":"This paper addresses the challenge of reliably capturing the temporal characteristics of local space-time patterns in dynamic texture (DT). A powerful DT descriptor is proposed, which enjoys strong robustness to viewpoint changes, illumination changes, and video deformation. Observing that local DT patterns are spatial-temporally distributed with stationary irregularities, we proposed to characterize the distributions of local binarized DT patterns along both the temporal and the spatial axes via lacunarity analysis. We also observed such irregularities are similar on the DT slices along the same axis but distinct between axes. Thus, the resulting lacunarity based features are averaged along each axis and concatenated as the final DT descriptor. We applied the proposed DT descriptor to DT classification and evaluated its performance on several benchmark datasets. The experimental results have demonstrated the power of the proposed descriptor in comparison with existing ones.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177473
Ruoyu Liu, Yao Zhao, Shikui Wei, Zhenfeng Zhu
Cross-media retrieval has received increasing interest in recent years, which aims to addressing the semantic correlation issues within rich media. As two key aspects, cross-media representation and indexing have been studied for dealing with cross-media similarity measure and the scalability issue, respectively. In this paper, we propose a new cross-media hashing scheme, called Centroid Approaching Cross-Media Hashing (CAMH), to handle both cross-media representation and indexing simultaneously. Different from existing indexing methods, the proposed method introduces semantic category information into the learning procedure, leading to more exact hash codes of multiple media type instances. In addition, we present a comparative study of cross-media indexing methods under a unique evaluation framework. Extensive experiments on two commonly used datasets demonstrate the good performance in terms of search accuracy and time complexity.
{"title":"Cross-media hashing with Centroid Approaching","authors":"Ruoyu Liu, Yao Zhao, Shikui Wei, Zhenfeng Zhu","doi":"10.1109/ICME.2015.7177473","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177473","url":null,"abstract":"Cross-media retrieval has received increasing interest in recent years, which aims to addressing the semantic correlation issues within rich media. As two key aspects, cross-media representation and indexing have been studied for dealing with cross-media similarity measure and the scalability issue, respectively. In this paper, we propose a new cross-media hashing scheme, called Centroid Approaching Cross-Media Hashing (CAMH), to handle both cross-media representation and indexing simultaneously. Different from existing indexing methods, the proposed method introduces semantic category information into the learning procedure, leading to more exact hash codes of multiple media type instances. In addition, we present a comparative study of cross-media indexing methods under a unique evaluation framework. Extensive experiments on two commonly used datasets demonstrate the good performance in terms of search accuracy and time complexity.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132579882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177412
Maosheng Zhang, R. Hu, Shihong Chen, Xiaochen Wang, Dengshi Li, Lin Jiang
Sound pressure and particle velocity are used to reproduce sound signals in multichannel systems. The two sound properties were estimated step by step and particle velocity was scaled due to ill-conditioned equations in Ando's study. We explore a new system of equations to maintain both sound pressure and particle velocity. The weight equations are solved in a non-traditional way to figure out exact solutions. Based on the proposed method, the perception of the direction of a sound event and the distance to the listening point are both reproduced correctly in a three-dimension reproduction system. The comparison between the proposed method and Ando's method is outlined and the proposed method is more flexible and useful. Objective evaluation shows the wavefront in the proposed method is more accurate than Ando's method and subjective evaluation confirms that the proposed method improves the spatial perception of sound events.
{"title":"Spatial perception reproduction of sound events based on sound property coincidences","authors":"Maosheng Zhang, R. Hu, Shihong Chen, Xiaochen Wang, Dengshi Li, Lin Jiang","doi":"10.1109/ICME.2015.7177412","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177412","url":null,"abstract":"Sound pressure and particle velocity are used to reproduce sound signals in multichannel systems. The two sound properties were estimated step by step and particle velocity was scaled due to ill-conditioned equations in Ando's study. We explore a new system of equations to maintain both sound pressure and particle velocity. The weight equations are solved in a non-traditional way to figure out exact solutions. Based on the proposed method, the perception of the direction of a sound event and the distance to the listening point are both reproduced correctly in a three-dimension reproduction system. The comparison between the proposed method and Ando's method is outlined and the proposed method is more flexible and useful. Objective evaluation shows the wavefront in the proposed method is more accurate than Ando's method and subjective evaluation confirms that the proposed method improves the spatial perception of sound events.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}