As an important foundation for image-guided technology, image matching technique is the key technology of modern war. This paper proposes a new algorithm of affine invariant detector and descriptor of local invariant feature points, starting from feature point detection and description point of view, making up the traditional feature point extraction defects of small number and types. Meantime, proposes an improved similarity measure method based on the previously proposed new feature point detection and description algorithm, it improves the matching accuracy and real-time performance. Finally, compares the experiment results of SURF, SIFT and the improved algorithm proposed in this paper, the experimental results shows that the feature points extracted by the improved algorithm has fully affine invariance, and improved the accuracy and speed of image matching algorithm efficiently.
{"title":"Research on Image Matching Algorithm Based on Local Invariant Features","authors":"Jiaqi Liu, Qiang Wu, Xuwen Li","doi":"10.1109/IIH-MSP.2013.37","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.37","url":null,"abstract":"As an important foundation for image-guided technology, image matching technique is the key technology of modern war. This paper proposes a new algorithm of affine invariant detector and descriptor of local invariant feature points, starting from feature point detection and description point of view, making up the traditional feature point extraction defects of small number and types. Meantime, proposes an improved similarity measure method based on the previously proposed new feature point detection and description algorithm, it improves the matching accuracy and real-time performance. Finally, compares the experiment results of SURF, SIFT and the improved algorithm proposed in this paper, the experimental results shows that the feature points extracted by the improved algorithm has fully affine invariance, and improved the accuracy and speed of image matching algorithm efficiently.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126055744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.
{"title":"Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal","authors":"Yohei Abe, A. Ito","doi":"10.1109/IIH-MSP.2013.76","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.76","url":null,"abstract":"Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features, without increasing the bitrate of the signal. In this paper, we show the basic framework of the method and apply the proposal method to multi-modal voice activity detection (VAD). As a result of detection experiment using the support vector machine, we obtained better performance than the audio-only VAD in a noisy environment. In addition, we investigated how data embedding into speech signal affects sound quality and detection performance.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a method of speech watermarking based on modifications to line spectral frequencies (LSFs) of original speech. LSFs were derived from each frame with linear prediction (LP) analysis and watermarks were embedded into them by using the quantization index modulation (QIM) of different quantization steps. We took into consideration inaudibility and robustness that were influenced by minor modifications to LSFs. The proposed approach was evaluated with two kinds of experiments with respect to inaudibility and robustness against different speech codecs and general processing. The results from the evaluations revealed that the proposed approach not only had high rate of bit detection while keeping the original sound quality undistorted but also good robustness against general speech processing.
{"title":"Watermarking Method for Speech Signals Based on Modifications to LSFs","authors":"Shengbei Wang, M. Unoki","doi":"10.1109/IIH-MSP.2013.79","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.79","url":null,"abstract":"We propose a method of speech watermarking based on modifications to line spectral frequencies (LSFs) of original speech. LSFs were derived from each frame with linear prediction (LP) analysis and watermarks were embedded into them by using the quantization index modulation (QIM) of different quantization steps. We took into consideration inaudibility and robustness that were influenced by minor modifications to LSFs. The proposed approach was evaluated with two kinds of experiments with respect to inaudibility and robustness against different speech codecs and general processing. The results from the evaluations revealed that the proposed approach not only had high rate of bit detection while keeping the original sound quality undistorted but also good robustness against general speech processing.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124871208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-ling Jiang, Jing Wang, Yi Zhao, Baoguang Liu, Xuan Ji
Through exploiting the human perception of spatial sound, a new approach for compression coding of multi-channel audio signal based on ITU-T G.719 codec is put forward in this paper. Multi-channel input signals are converted to a down-mixed signal plus spatial perceptual parameters by use of down-mix and up-mix step-by-step techniques in frequency domain. The algorithm can significantly reduce the coding rate under the premise of an acceptable sound quality in combination with the G.719 audio codec. The paper presents the implementation of the algorithm and describes in detail the calculation and features of the selected spatial parameters. Finally some experiments are done to evaluate the algorithm from the perspective of the compression ratio, the reconstructed sound quality, and the algorithm complexity.
{"title":"Multi-channel Audio Compression Method Based on ITU-T G.719 Codec","authors":"Wen-ling Jiang, Jing Wang, Yi Zhao, Baoguang Liu, Xuan Ji","doi":"10.1109/IIH-MSP.2013.81","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.81","url":null,"abstract":"Through exploiting the human perception of spatial sound, a new approach for compression coding of multi-channel audio signal based on ITU-T G.719 codec is put forward in this paper. Multi-channel input signals are converted to a down-mixed signal plus spatial perceptual parameters by use of down-mix and up-mix step-by-step techniques in frequency domain. The algorithm can significantly reduce the coding rate under the premise of an acceptable sound quality in combination with the G.719 audio codec. The paper presents the implementation of the algorithm and describes in detail the calculation and features of the selected spatial parameters. Finally some experiments are done to evaluate the algorithm from the perspective of the compression ratio, the reconstructed sound quality, and the algorithm complexity.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122585599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-16DOI: 10.1109/IIH-MSP.2013.140
Leida Li, Wei Zhang, Shushang Li, Jeng-Shyang Pan
Region duplication is a common method to produce forgery images, where part of an image is copied and pasted somewhere else in the same image. In order to fit the scene better and leave no visible artifacts, the copied region may be processed by affine transforms before being pasted. Most of the existing methods cannot handle these transforms. This paper presents a method to detect the region-duplication forgery under affine transforms. The image is first filtered and divided into overlapping circular blocks. Then the normalized color histogram (NCH) is extracted as the block feature. Forgery detection is achieved by comparing the NCH features. A new filter is designed to process the initial detection results. The final detection map is obtained after morphological operations. Simulations demonstrate the efficiency of the method.
{"title":"Detection of Region Duplication Forgery in Images under Affine Transforms","authors":"Leida Li, Wei Zhang, Shushang Li, Jeng-Shyang Pan","doi":"10.1109/IIH-MSP.2013.140","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.140","url":null,"abstract":"Region duplication is a common method to produce forgery images, where part of an image is copied and pasted somewhere else in the same image. In order to fit the scene better and leave no visible artifacts, the copied region may be processed by affine transforms before being pasted. Most of the existing methods cannot handle these transforms. This paper presents a method to detect the region-duplication forgery under affine transforms. The image is first filtered and divided into overlapping circular blocks. Then the normalized color histogram (NCH) is extracted as the block feature. Forgery detection is achieved by comparing the NCH features. A new filter is designed to process the initial detection results. The final detection map is obtained after morphological operations. Simulations demonstrate the efficiency of the method.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-16DOI: 10.1109/IIH-MSP.2013.139
Leida Li, Hancheng Zhu, Deqiang Cheng, Jeng-Shyang Pan
This paper presents a new full-reference image quality measure using discrete orthogonal moments. The sign of the moment is considered and the relative difference of the moments is obtained by comparing the absolute moment difference (AMD) with the magnitude of the original moment. A new quality function is proposed, which is an exponential function of the relative moment difference (RMD). Simulation results show the efficiency of the method.
{"title":"A New Moment Based Image Quality Metric","authors":"Leida Li, Hancheng Zhu, Deqiang Cheng, Jeng-Shyang Pan","doi":"10.1109/IIH-MSP.2013.139","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.139","url":null,"abstract":"This paper presents a new full-reference image quality measure using discrete orthogonal moments. The sign of the moment is considered and the relative difference of the moments is obtained by comparing the absolute moment difference (AMD) with the magnitude of the original moment. A new quality function is proposed, which is an exponential function of the relative moment difference (RMD). Simulation results show the efficiency of the method.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"420 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116855389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper compares two objective audio quality assessment measures calculated for three watermark methods with its corresponding subjective quality. The aim was to see if these measures could be used to estimate the subjective audio quality with various watermarks. Samples were watermarked with the LSB substitution, the direct spread-spectrum, and the echo hiding methods. The objective scores were calculated using peaqb, an implementation of the ITU-R BS.1387-1 standard, and PEMO-Q. PEMO-Q showed significantly higher correlation, about 0.90 compared to peaqb. Initial quality estimation tests were also conducted, where regression from objective score to the subjective score of one watermark (e.g. LSB) was estimated, and this regression was used to estimate the subjective score of another watermark method (e.g. spread-spectrum) from its objective score. PEMO-Q showed higher estimation accuracy, with Root Mean Square Error (RMSE) of about 11%.
{"title":"Towards Estimation of Quality of Watermarked Audio Signal Using Objective Measures","authors":"K. Kondo","doi":"10.1109/IIH-MSP.2013.78","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.78","url":null,"abstract":"This paper compares two objective audio quality assessment measures calculated for three watermark methods with its corresponding subjective quality. The aim was to see if these measures could be used to estimate the subjective audio quality with various watermarks. Samples were watermarked with the LSB substitution, the direct spread-spectrum, and the echo hiding methods. The objective scores were calculated using peaqb, an implementation of the ITU-R BS.1387-1 standard, and PEMO-Q. PEMO-Q showed significantly higher correlation, about 0.90 compared to peaqb. Initial quality estimation tests were also conducted, where regression from objective score to the subjective score of one watermark (e.g. LSB) was estimated, and this regression was used to estimate the subjective score of another watermark method (e.g. spread-spectrum) from its objective score. PEMO-Q showed higher estimation accuracy, with Root Mean Square Error (RMSE) of about 11%.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126607975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimedia System has been widely developed. This paper proposes a new kind of fast Fourier transformation (FFT) and a carrier recovery loop for accurate fine tracking. This paper uses the FFT carrier frequency offset to pre-estimate it that corrects the big frequency firstly, based on this, it uses the carrier frequency ring circuit to correct the small frequency offset. Comparing with other methods, its estimation is more accurate.
{"title":"A New Frequency Pre-estimation Aided Carrier Recovery Algorithm for Multimodal Signal System","authors":"Wang Ranran, Wang Botao, Lu-Xin Yan","doi":"10.1109/IIH-MSP.2013.49","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.49","url":null,"abstract":"Multimedia System has been widely developed. This paper proposes a new kind of fast Fourier transformation (FFT) and a carrier recovery loop for accurate fine tracking. This paper uses the FFT carrier frequency offset to pre-estimate it that corrects the big frequency firstly, based on this, it uses the carrier frequency ring circuit to correct the small frequency offset. Comparing with other methods, its estimation is more accurate.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-16DOI: 10.1109/IIH-MSP.2013.163
Zhi-Chun Li, Chunxiao Zhang
This paper proposes a DRM system based on PKCS#12 to meet the requirement of security and flexibility in digital media application. It designs the system architecture and the security protocol of user registration, certificate issuing, encrypted digital content distribution, authorized license delivery, authentication and decryption, etc. With the security feature of PKCS#12 and the designed protocol, the proposed system can ensure the security of certificate and private key during the storage and transfer. And this system supports participation through different devices, can prevent digital rights from illegal sharing.
{"title":"Digital Rights Management System Based on PKCS#12","authors":"Zhi-Chun Li, Chunxiao Zhang","doi":"10.1109/IIH-MSP.2013.163","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.163","url":null,"abstract":"This paper proposes a DRM system based on PKCS#12 to meet the requirement of security and flexibility in digital media application. It designs the system architecture and the security protocol of user registration, certificate issuing, encrypted digital content distribution, authorized license delivery, authentication and decryption, etc. With the security feature of PKCS#12 and the designed protocol, the proposed system can ensure the security of certificate and private key during the storage and transfer. And this system supports participation through different devices, can prevent digital rights from illegal sharing.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"267 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hu Ruimin, Zhang Maosheng, Yang Yuhong, Wang Xiaochen, Shi Dong, Jiang Lin
Vector-based amplitude panning in three dimensional sound reproduction aims to preserve both sound image direction and distance perception. While in the estimation process, the loudspeakers are supposed to place on a sphere. It is possible that this requirement cannot be met in home environment. An alternative method to estimate gain factors in vector-based amplitude panning is proposed to preserve distance perception in this study. The experiments confirm that listeners do not perceive obvious distance differences when panning and confirm the validation of the proposed method.
{"title":"Gain Factors Calibration in 3D Sound Reproduction Using VBAP","authors":"Hu Ruimin, Zhang Maosheng, Yang Yuhong, Wang Xiaochen, Shi Dong, Jiang Lin","doi":"10.1109/IIH-MSP.2013.82","DOIUrl":"https://doi.org/10.1109/IIH-MSP.2013.82","url":null,"abstract":"Vector-based amplitude panning in three dimensional sound reproduction aims to preserve both sound image direction and distance perception. While in the estimation process, the loudspeakers are supposed to place on a sphere. It is possible that this requirement cannot be met in home environment. An alternative method to estimate gain factors in vector-based amplitude panning is proposed to preserve distance perception in this study. The experiments confirm that listeners do not perceive obvious distance differences when panning and confirm the validation of the proposed method.","PeriodicalId":105427,"journal":{"name":"2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123085263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}