Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743930
Francis F. Li
Much content related information can be extracted from recorded soundtracks, such as those of multimedia files. The soundtracks might be heuristically classified into three categories namely speech, music and ambient or event sounds. Research in the past focused on algorithms to classify audio clips in an exclusive manner. However, soundtracks from media content are often presented as overlapped mixtures of all these three types of sounds. Nonexclusive segmentation and indexing are therefore essential pre-processors for effective audio information mining and metadata generation. This paper emphasizes the importance of nonexclusive indexing and segmentation methods, identifies the challenges and proposes a universal architecture for nonexclusive segmentation and indexing as a pre-processor for audio information mining, metadata extraction and scene analysis. Related feature selection, pattern recognition and signal processing algorithms are presented and testing results discussed.
{"title":"Nonexclusive audio segmentation and indexing as a pre-processor for audio information mining","authors":"Francis F. Li","doi":"10.1109/CISP.2013.6743930","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743930","url":null,"abstract":"Much content related information can be extracted from recorded soundtracks, such as those of multimedia files. The soundtracks might be heuristically classified into three categories namely speech, music and ambient or event sounds. Research in the past focused on algorithms to classify audio clips in an exclusive manner. However, soundtracks from media content are often presented as overlapped mixtures of all these three types of sounds. Nonexclusive segmentation and indexing are therefore essential pre-processors for effective audio information mining and metadata generation. This paper emphasizes the importance of nonexclusive indexing and segmentation methods, identifies the challenges and proposes a universal architecture for nonexclusive segmentation and indexing as a pre-processor for audio information mining, metadata extraction and scene analysis. Related feature selection, pattern recognition and signal processing algorithms are presented and testing results discussed.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124200316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743982
Yumin Yang
Complexity pursuit is a recently developed algorithm using the gradient descent for separating interesting components from time series. It is an extension of projection pursuit to time series data and the method is closely related to blind separation of time-dependent source signals and independent component analysis. The goal is to find projections of time series that have interesting structure, defined using criteria related to Kolmogoroff complexity or coding length. In this paper, we derived a simple approximation of coding length that takes into account the nongaussianity, the autocorrelations and the variance nonstationary of the time series. We give a simple algorithm for its approximative optimization.
{"title":"Complexity pursuit for unifying time series","authors":"Yumin Yang","doi":"10.1109/CISP.2013.6743982","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743982","url":null,"abstract":"Complexity pursuit is a recently developed algorithm using the gradient descent for separating interesting components from time series. It is an extension of projection pursuit to time series data and the method is closely related to blind separation of time-dependent source signals and independent component analysis. The goal is to find projections of time series that have interesting structure, defined using criteria related to Kolmogoroff complexity or coding length. In this paper, we derived a simple approximation of coding length that takes into account the nongaussianity, the autocorrelations and the variance nonstationary of the time series. We give a simple algorithm for its approximative optimization.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124554557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743873
Yi Liu, Jing Hua, Xiangang Li, Xihong Wu
Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.
{"title":"Error feedback based lexical entity extraction for Chinese language modeling","authors":"Yi Liu, Jing Hua, Xiangang Li, Xihong Wu","doi":"10.1109/CISP.2013.6743873","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743873","url":null,"abstract":"Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114813839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6745249
Ti-zhou Qiao, S. Dai
In order to estimate head pose precisely in real time with computer vision technology, an enhanced framework using depth data and random regression forest is implemented for head pose estimation. This framework bases on head position and direction point recognition to accomplish head pose estimation. When training random forest, a decision function derived from Haar-like features is used as the binary test and this test uses some data features like Gaussian Curvature and Mean Curvature besides depth value and normal vector. We also generate a large training dataset of range images of heads by virtual structured light scanning. All votes of patches are filtered by clustering and mean shift, and then mean of them are used to estimate position of feature points. Performance evaluation shows accurate pose estimation (success rate above 90%) when running at real-time speed.
{"title":"Fast head pose estimation using depth data","authors":"Ti-zhou Qiao, S. Dai","doi":"10.1109/CISP.2013.6745249","DOIUrl":"https://doi.org/10.1109/CISP.2013.6745249","url":null,"abstract":"In order to estimate head pose precisely in real time with computer vision technology, an enhanced framework using depth data and random regression forest is implemented for head pose estimation. This framework bases on head position and direction point recognition to accomplish head pose estimation. When training random forest, a decision function derived from Haar-like features is used as the binary test and this test uses some data features like Gaussian Curvature and Mean Curvature besides depth value and normal vector. We also generate a large training dataset of range images of heads by virtual structured light scanning. All votes of patches are filtered by clustering and mean shift, and then mean of them are used to estimate position of feature points. Performance evaluation shows accurate pose estimation (success rate above 90%) when running at real-time speed.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114712480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743949
F. Han, M. Alkhathami, R. van Schyndel
Kerberos is an authentication protocol in which client and server can mutually authenticate each other across an insecure network connection. After the identity authentication, client and server can encrypt all of subsequent communications to ensure privacy and data integrity. In this paper, a biometric Kerberos-based user identity authentication scheme is presented. In the scheme, smart phones having computing capability and an internal mobile camera are the only device required at the user-end. The combination of owner biometrics and device information will be used for identity authentication. A watermark links the device to its user. The watermark is produced and embedded by using the internal functions of smart phones entirely and the watermark embedding key is the by-product in Kerberos authentication. Only the trusted key distribution center has enough knowledge to detect and remove the watermark. The ticket for the permission to access an application resource will only be issued upon successful biometric authentication. The watermark also offers forensic traceability in a resource constraint environment. As a result, cost effective strong security can be attained in mobile computing services.
{"title":"Biometric-Kerberos authentication scheme for secure mobile computing services","authors":"F. Han, M. Alkhathami, R. van Schyndel","doi":"10.1109/CISP.2013.6743949","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743949","url":null,"abstract":"Kerberos is an authentication protocol in which client and server can mutually authenticate each other across an insecure network connection. After the identity authentication, client and server can encrypt all of subsequent communications to ensure privacy and data integrity. In this paper, a biometric Kerberos-based user identity authentication scheme is presented. In the scheme, smart phones having computing capability and an internal mobile camera are the only device required at the user-end. The combination of owner biometrics and device information will be used for identity authentication. A watermark links the device to its user. The watermark is produced and embedded by using the internal functions of smart phones entirely and the watermark embedding key is the by-product in Kerberos authentication. Only the trusted key distribution center has enough knowledge to detect and remove the watermark. The ticket for the permission to access an application resource will only be issued upon successful biometric authentication. The watermark also offers forensic traceability in a resource constraint environment. As a result, cost effective strong security can be attained in mobile computing services.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123552722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743909
D. Qiu, Jiancheng Yu, X. Lu
The future satellite communication systems should be able to accommodate the integrated services with a variety of applications and fulfill Quality of Service requirements. Regarding to the limited and costly resources, a novel resource allocation in a multi-frequency time-division multiple access (MF-TDMA) is proposed. In order to optimize packing performance, the users with more time slots are packed first. The algorithm's performance of delay and channel utilization can be validated through the simulation for different types of traffic. We further compare the performances of classical schemes. The theoretical analysis and simulation results show that that new algorithm can efficiently improve channel utilization in satellite resource allocation, especially when the number of satellite terminals is large. The algorithm is simple and easy to implement in satellite communication system.
{"title":"A novel resource allocation scheme based on multi-satellite terminals in MF-TDMA satellite systems","authors":"D. Qiu, Jiancheng Yu, X. Lu","doi":"10.1109/CISP.2013.6743909","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743909","url":null,"abstract":"The future satellite communication systems should be able to accommodate the integrated services with a variety of applications and fulfill Quality of Service requirements. Regarding to the limited and costly resources, a novel resource allocation in a multi-frequency time-division multiple access (MF-TDMA) is proposed. In order to optimize packing performance, the users with more time slots are packed first. The algorithm's performance of delay and channel utilization can be validated through the simulation for different types of traffic. We further compare the performances of classical schemes. The theoretical analysis and simulation results show that that new algorithm can efficiently improve channel utilization in satellite resource allocation, especially when the number of satellite terminals is large. The algorithm is simple and easy to implement in satellite communication system.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122026873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743871
Yinfeng Wang, S. Huang, Ying Wei
Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress.
{"title":"A voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin","authors":"Yinfeng Wang, S. Huang, Ying Wei","doi":"10.1109/CISP.2013.6743871","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743871","url":null,"abstract":"Voice activity detection algorithms are widely used in the areas of voice compression, speech synthesis, speech recognition, speech enhancement, and etc. In this paper, an efficient voice activity detection algorithm with sub-band detection based on time-frequency characteristics of mandarin is proposed. The proposed sub-band detection consists of two parts: crosswise detection and lengthwise detection. Energy detection and pitch detection are in the range of considerations. For a better performance, double-threshold criterion is used to reduce the misjudgment rate of the detection. Performance evaluation is based on six noise environments with different SNRs. Experiment results indicate that the proposed algorithm can detect the area of voice effectively in non-stationary environment and low SNR environment and has the potential to progress.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116803998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743883
W. Cai, Hui Li, W. Xu, M. Dai
In this paper, a non-invasive oil temperature measurement instrument, together with the oil temperature calculation model based on the pipe surface and ambient temperatures, is presented. In order to validate the model, a verification device was developed which can measure the oil, pipe surface and ambient temperatures simultaneously. A series of model verification and optimization tests under different working conditions were carried out on the device. It can be concluded from the experiment results that the measurement instrument based on the optimized calculation model can perform non-invasive oil temperature measurement accurately.
{"title":"Design and verification of a non-invasive oil temperature measurement instrument","authors":"W. Cai, Hui Li, W. Xu, M. Dai","doi":"10.1109/CISP.2013.6743883","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743883","url":null,"abstract":"In this paper, a non-invasive oil temperature measurement instrument, together with the oil temperature calculation model based on the pipe surface and ambient temperatures, is presented. In order to validate the model, a verification device was developed which can measure the oil, pipe surface and ambient temperatures simultaneously. A series of model verification and optimization tests under different working conditions were carried out on the device. It can be concluded from the experiment results that the measurement instrument based on the optimized calculation model can perform non-invasive oil temperature measurement accurately.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6743878
S. Ou, Haidong Sun, Yanqin Zhang, Ying Gao
The Laplacian model factor estimation is a critical link for noisy speech enhancement technique employing Laplacian statistical model priori of clean speech. In this letter, we propose a novel estimation algorithm for this parameter based on soft decision in discrete cosine transform domain. As the speech signal is not always present in the noisy speech signal at all components, we first compute the speech presence probability which is decided in each discrete cosine transform component, and then based on the minimum mean square error estimation theory, the Laplacian model factor is estimated in the speech presence stage. Simulation experiment results demonstrate that the proposed algorithm possesses improved performance than that of the conventional method under different noisy conditions and levels.
{"title":"Soft decision based Laplacian model factor estimation for noisy speech enhancement","authors":"S. Ou, Haidong Sun, Yanqin Zhang, Ying Gao","doi":"10.1109/CISP.2013.6743878","DOIUrl":"https://doi.org/10.1109/CISP.2013.6743878","url":null,"abstract":"The Laplacian model factor estimation is a critical link for noisy speech enhancement technique employing Laplacian statistical model priori of clean speech. In this letter, we propose a novel estimation algorithm for this parameter based on soft decision in discrete cosine transform domain. As the speech signal is not always present in the noisy speech signal at all components, we first compute the speech presence probability which is decided in each discrete cosine transform component, and then based on the minimum mean square error estimation theory, the Laplacian model factor is estimated in the speech presence stage. Simulation experiment results demonstrate that the proposed algorithm possesses improved performance than that of the conventional method under different noisy conditions and levels.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123941907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-01DOI: 10.1109/CISP.2013.6745232
Yikai Wang, W. Xia, Zishu He
The passive polarimetric detection problem for vector-sensor array is formulated based on quaternion. The quaternion-formed detectors are deduced and analyzed in the presence of different quaternion proper noise. The properness of quaternion Gaussian noise, especially for the C-proper and Q-proper cases in this paper, is discussed based on the second order statistics (SOS) of the complex components of quaternion noise. Numerical simulations and statistical analysis demonstrate that the C-proper, Q-proper and complex detectors are equivalent in the Q-proper case, and that the Q-proper detector behaves poorer than C-proper and complex detectors in the C-proper case.
{"title":"Polarimetric detection for vector-sensor array in quaternion Gaussian proper noise","authors":"Yikai Wang, W. Xia, Zishu He","doi":"10.1109/CISP.2013.6745232","DOIUrl":"https://doi.org/10.1109/CISP.2013.6745232","url":null,"abstract":"The passive polarimetric detection problem for vector-sensor array is formulated based on quaternion. The quaternion-formed detectors are deduced and analyzed in the presence of different quaternion proper noise. The properness of quaternion Gaussian noise, especially for the C-proper and Q-proper cases in this paper, is discussed based on the second order statistics (SOS) of the complex components of quaternion noise. Numerical simulations and statistical analysis demonstrate that the C-proper, Q-proper and complex detectors are equivalent in the Q-proper case, and that the Q-proper detector behaves poorer than C-proper and complex detectors in the C-proper case.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124605722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}