Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244733
Yan Liu, Jiying Zhao
Based on log-polar mapping, this paper presents a new filtering method. We compare our new filtering method with the classical matched filter, phase-only filter, binary phase-only filter, amplitude-only filter, and inverse filter. We found that our new method is the only one that is robust against rotation, scaling, and translation (RST) transformation. We use the filtering method in our new rotation, scaling, and translation (RST) invariant digital image watermarking scheme, to rectify the watermark position. The watermarking scheme does not need the original image to extract watermark and avoids exhaustive search. The three-dimensional plots of cross-correlation functions of different filters are presented and discussed.
{"title":"A new filtering method for RST invariant image watermarking","authors":"Yan Liu, Jiying Zhao","doi":"10.1109/HAVE.2003.1244733","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244733","url":null,"abstract":"Based on log-polar mapping, this paper presents a new filtering method. We compare our new filtering method with the classical matched filter, phase-only filter, binary phase-only filter, amplitude-only filter, and inverse filter. We found that our new method is the only one that is robust against rotation, scaling, and translation (RST) transformation. We use the filtering method in our new rotation, scaling, and translation (RST) invariant digital image watermarking scheme, to rectify the watermark position. The watermarking scheme does not need the original image to extract watermark and avoids exhaustive search. The three-dimensional plots of cross-correlation functions of different filters are presented and discussed.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126624270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244731
Ronghui Tu, Jiying Zhao
In this paper, we present a semi-fragile audio watermarking technique which embeds watermark in the discrete wavelet domain of an audio by quantizing the selected coefficients. The quantization parameter used in the algorithm is user-defined. Different value of the quantization parameter affects the robustness of the watermark. This allows the user to control the performance of the watermark. The application of our approach could be copyright verification or content authentication.
{"title":"A novel semi-fragile audio watermarking scheme","authors":"Ronghui Tu, Jiying Zhao","doi":"10.1109/HAVE.2003.1244731","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244731","url":null,"abstract":"In this paper, we present a semi-fragile audio watermarking technique which embeds watermark in the discrete wavelet domain of an audio by quantizing the selected coefficients. The quantization parameter used in the algorithm is user-defined. Different value of the quantization parameter affects the robustness of the watermark. This allows the user to control the performance of the watermark. The application of our approach could be copyright verification or content authentication.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116489865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244729
Qing Chen, E. Petriu
This paper discusses the performance of Fourier descriptors and Hu's seven moment invariants for an Optical Character Recognition (OCR) engine developed for 3D model-based object recognition applications.
{"title":"Optical character recognition for model-based object recognition applications","authors":"Qing Chen, E. Petriu","doi":"10.1109/HAVE.2003.1244729","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244729","url":null,"abstract":"This paper discusses the performance of Fourier descriptors and Hu's seven moment invariants for an Optical Character Recognition (OCR) engine developed for 3D model-based object recognition applications.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"70 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120907638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244725
Jean-Christian Delannoy, E. Petriu, P. Wide
Many of the virtual environments on the market today only account for 3D geometry and lighting of objects. These environments may appear realistic in a static domain but when the objects are set into motion the object's movements often appear unnatural. This paper presents algorithms to model the world around us more accurately by accounting for the mechanical behaviors and properties of objects, and by basing the virtual world on sensor information provided by objects in the real world.
{"title":"Mechanics modeling for virtual interactive environments","authors":"Jean-Christian Delannoy, E. Petriu, P. Wide","doi":"10.1109/HAVE.2003.1244725","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244725","url":null,"abstract":"Many of the virtual environments on the market today only account for 3D geometry and lighting of objects. These environments may appear realistic in a static domain but when the objects are set into motion the object's movements often appear unnatural. This paper presents algorithms to model the world around us more accurately by accounting for the mechanical behaviors and properties of objects, and by basing the virtual world on sensor information provided by objects in the real world.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121631187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244727
D. Zheng, Jiying Zhao, W. J. Tam, F. Speranza
This paper presents an objective picture quality measurement method based on the fragile digital image watermarking. Based on the DCT-based watermarking scheme, this paper presents a fragile digital image watermarking scheme that can work as an automatic quality monitoring system. We embed watermark in the DCT domain of original image, and the DCT blocks for embedding are carefully selected so that the degradation of the watermark can reflect the degradation of the image. The evaluations demonstrate the effectiveness of the proposed scheme against JPEG compression.
{"title":"Image quality measurement by using digital watermarking","authors":"D. Zheng, Jiying Zhao, W. J. Tam, F. Speranza","doi":"10.1109/HAVE.2003.1244727","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244727","url":null,"abstract":"This paper presents an objective picture quality measurement method based on the fragile digital image watermarking. Based on the DCT-based watermarking scheme, this paper presents a fragile digital image watermarking scheme that can work as an automatic quality monitoring system. We embed watermark in the DCT domain of original image, and the DCT blocks for embedding are carefully selected so that the degradation of the watermark can reflect the degradation of the image. The evaluations demonstrate the effectiveness of the proposed scheme against JPEG compression.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128088774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244735
Xiaojun Shen, Francis Bogsanyi, L. Ni, N. Georganas
The purpose of this research effort is to design a generic architecture for collaborative haptic, audio, visual environments (C-HAVE). We aim to develop a heterogeneous scalable architecture for large collaborative haptics environments where a number of potential users participate with different kinds of haptic devices. This paper begins with a brief overview of C-HAVE and then proceeds to describe a generic architecture that is implemented over HLA/RTI (High Level Architecture/Run Time Infrastructure), an IEEE standard for distributed simulations and modeling. A potential electronic commerce application over C-HAVE is discussed.
{"title":"A heterogeneous scalable architecture for collaborative haptics environments","authors":"Xiaojun Shen, Francis Bogsanyi, L. Ni, N. Georganas","doi":"10.1109/HAVE.2003.1244735","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244735","url":null,"abstract":"The purpose of this research effort is to design a generic architecture for collaborative haptic, audio, visual environments (C-HAVE). We aim to develop a heterogeneous scalable architecture for large collaborative haptics environments where a number of potential users participate with different kinds of haptic devices. This paper begins with a brief overview of C-HAVE and then proceeds to describe a generic architecture that is implemented over HLA/RTI (High Level Architecture/Run Time Infrastructure), an IEEE standard for distributed simulations and modeling. A potential electronic commerce application over C-HAVE is discussed.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130473282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244723
A.R. Abu-El-Quran, R. Goubran
This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.
{"title":"Pitch-based feature extraction for audio classification","authors":"A.R. Abu-El-Quran, R. Goubran","doi":"10.1109/HAVE.2003.1244723","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244723","url":null,"abstract":"This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125149576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244716
R. Dansereau, C. Li, R. Goubran
In this paper, we present a Markov random field based technique for extracting lip features from video using color and edge information. Motion between frames is used as an indicator to locate the approximate lip region, while color and edge information allow boundaries of naturally covered lips to be identified and segmented from the rest of the face. Using the lip region, geometric lip features are then extracted from the segmented lip area. The experimental results show that 96% accuracy is obtained in extracting six key lip feature points in typical talking head video sequences when the tongue is not visible in the scene, and 90% accuracy when the tongue is visible.
{"title":"Lip feature extraction using motion, color, and edge information","authors":"R. Dansereau, C. Li, R. Goubran","doi":"10.1109/HAVE.2003.1244716","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244716","url":null,"abstract":"In this paper, we present a Markov random field based technique for extracting lip features from video using color and edge information. Motion between frames is used as an indicator to locate the approximate lip region, while color and edge information allow boundaries of naturally covered lips to be identified and segmented from the rest of the face. Using the lip region, geometric lip features are then extracted from the segmented lip area. The experimental results show that 96% accuracy is obtained in extracting six key lip feature points in typical talking head video sequences when the tongue is not visible in the scene, and 90% accuracy when the tongue is visible.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130899557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244721
A. Crétu, E. Petriu, G. Patry
The paper discusses a neural network architecture for 3D object modeling. A multi-layered feedforward structure having as inputs the 3D-coordinates of the object points is employed to model the object space. Cascaded with a transformation neural network module, the proposed architecture can be used to generate and train 3D objects, perform transformations, set operations and object morphing. A possible application for object recognition is also presented.
{"title":"Neural network architecture for 3D object representation","authors":"A. Crétu, E. Petriu, G. Patry","doi":"10.1109/HAVE.2003.1244721","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244721","url":null,"abstract":"The paper discusses a neural network architecture for 3D object modeling. A multi-layered feedforward structure having as inputs the 3D-coordinates of the object points is employed to model the object space. Cascaded with a transformation neural network module, the proposed architecture can be used to generate and train 3D objects, perform transformations, set operations and object morphing. A possible application for object recognition is also presented.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134537220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-11-10DOI: 10.1109/HAVE.2003.1244726
Zhong Lin, R. Goubran
This paper investigates the problem of "musical noise" and proposes a new algorithm to reduce it. Musical noise occurs in most of the spectral-estimation-based algorithms, such as spectral subtraction and minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA). To reduce this type of noise, a novel algorithm, which is called two-dimensional spectogram enhancement, is proposed. A speech enhancement scheme is implemented by combining the proposed algorithm with the MMSE-STSA method. Spectogram comparisons show that with the proposed scheme, musical noise is effectively reduced with reference to MMSE-STSA. SNR and PESQ evaluations show that the proposed method is superior to MMSE-STSA and spectral subtraction with auditory masking method.
{"title":"Musical noise reduction in speech using two-dimensional spectrogram enhancement","authors":"Zhong Lin, R. Goubran","doi":"10.1109/HAVE.2003.1244726","DOIUrl":"https://doi.org/10.1109/HAVE.2003.1244726","url":null,"abstract":"This paper investigates the problem of \"musical noise\" and proposes a new algorithm to reduce it. Musical noise occurs in most of the spectral-estimation-based algorithms, such as spectral subtraction and minimum mean-square error short-time spectral amplitude estimator (MMSE-STSA). To reduce this type of noise, a novel algorithm, which is called two-dimensional spectogram enhancement, is proposed. A speech enhancement scheme is implemented by combining the proposed algorithm with the MMSE-STSA method. Spectogram comparisons show that with the proposed scheme, musical noise is effectively reduced with reference to MMSE-STSA. SNR and PESQ evaluations show that the proposed method is superior to MMSE-STSA and spectral subtraction with auditory masking method.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133596610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}