Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659917
A. Saracoglu, A. Alatan
For the indexing and management of large scale video databases an important tool would be the text in the digital media. In this work, the localization performances of the overlay texts using different feature extraction methods with different classifiers are analyzed. Besides that in order to improve the text recognition rate by using multiple hypothesis obtained from multilevel segmentation and using statistical language model are investigated
{"title":"Automatic Video Text Localization and Recognition","authors":"A. Saracoglu, A. Alatan","doi":"10.1109/SIU.2006.1659917","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659917","url":null,"abstract":"For the indexing and management of large scale video databases an important tool would be the text in the digital media. In this work, the localization performances of the overlay texts using different feature extraction methods with different classifiers are analyzed. Besides that in order to improve the text recognition rate by using multiple hypothesis obtained from multilevel segmentation and using statistical language model are investigated","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115442465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659894
A. Salah
We propose a coarse-to-fme method for facial landmark localization that relies on unsupervised modeling of landmark features obtained through different Gabor filter channels. The input to the system is a registered near-frontal 2D and 3D face image pair, with background clutter. The system aims at complete automatic detection of seven facial landmarks; the nose tip, eye and mouth corners, respectively. A structural analysis subsystem is employed to detect incorrect landmarks and to correct them. We compare our local features with two widely used Gabor jet based methods, and illustrate their superior performance
{"title":"Gabor Factor Analysis for 2D+3D Facial Landmark Localization","authors":"A. Salah","doi":"10.1109/SIU.2006.1659894","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659894","url":null,"abstract":"We propose a coarse-to-fme method for facial landmark localization that relies on unsupervised modeling of landmark features obtained through different Gabor filter channels. The input to the system is a registered near-frontal 2D and 3D face image pair, with background clutter. The system aims at complete automatic detection of seven facial landmarks; the nose tip, eye and mouth corners, respectively. A structural analysis subsystem is employed to detect incorrect landmarks and to correct them. We compare our local features with two widely used Gabor jet based methods, and illustrate their superior performance","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124335115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659897
C. Berkdemir, S. Ozsoy
Optimal performance calculations in point of the gain and the pump power of hybrid amplifiers consisting of erbium-doped fiber amplifiers (EDFAs) and fiber Raman amplifiers for increasing the transmission capacity in communication systems are preformed. In the hybrid amplifier system which is formed from a single-mode EDFA, which has 18 m length, and two fiber Raman amplifiers, which have 6,5 and 7 km lengths, it is shown that the gain of 3-4 dB can be obtained for the long-wavelength applications
{"title":"Design of Hybrid Fiber Raman and EDFAs Operated by Optimal Performance","authors":"C. Berkdemir, S. Ozsoy","doi":"10.1109/SIU.2006.1659897","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659897","url":null,"abstract":"Optimal performance calculations in point of the gain and the pump power of hybrid amplifiers consisting of erbium-doped fiber amplifiers (EDFAs) and fiber Raman amplifiers for increasing the transmission capacity in communication systems are preformed. In the hybrid amplifier system which is formed from a single-mode EDFA, which has 18 m length, and two fiber Raman amplifiers, which have 6,5 and 7 km lengths, it is shown that the gain of 3-4 dB can be obtained for the long-wavelength applications","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114881402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659836
U. Yilmaz, O. Hellwich
A synthetic imaging tool by which three-dimensional model descriptions given in OpenGL are converted into images, is described. Camera parameters are also extracted and attached to the images. Conversion of OpenGL matrices to calibration parameters and conversion of calibration parameters to OpenGL matrices are explained in detail. Radial distortion is also modeled, so that images become more realistic. The libraries, created in the scope of this work, are made publicly available over Internet under . In the lack of time or photographing equipment, the tools presented in this study would be vital for the researchers who want to test their surface modeling and calibration algorithms.
{"title":"A Tool for Creating Calibrated Images","authors":"U. Yilmaz, O. Hellwich","doi":"10.1109/SIU.2006.1659836","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659836","url":null,"abstract":"A synthetic imaging tool by which three-dimensional model descriptions given in OpenGL are converted into images, is described. Camera parameters are also extracted and attached to the images. Conversion of OpenGL matrices to calibration parameters and conversion of calibration parameters to OpenGL matrices are explained in detail. Radial distortion is also modeled, so that images become more realistic. The libraries, created in the scope of this work, are made publicly available over Internet under <www.cv.tu-berlin.de/~ulas/rarf>. In the lack of time or photographing equipment, the tools presented in this study would be vital for the researchers who want to test their surface modeling and calibration algorithms.","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114697744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659773
E. Arisoy, M. Saraçlar
In this paper, we have tried some language modelling approaches for large vocabulary continuous speech recognition (LVCSR) of Turkish. The agglutinative nature of Turkish makes Turkish a challenging language in terms of speech recognition since it is impossible to include all possible words in the recognition lexicon. Therefore, instead of using words as recognition units, we use a data-driven sub-word approach called morphs. This method was previously applied to Finnish, Estonian and Turkish and promising recognition results were achieved compared to words as recognition units. In our database, we obtained word error rates (WER) of 38.8% for the baseline word-based model and 33.9% for the baseline morph-based model. In addition, we tried some new methods. Recognition lattice outputs of each model were rescored with the root-based and root-class-based models for the word-based case and first-morph-based model for the morph-based case. The word-root composition approach achieves a 0.5% increase in the recognition performance. However, other two approaches fail due to the non-robust estimates over the baseline models
{"title":"Language Modelling Approaches for Turkish Large Vocabulary Continuous Speech Recognition Based on Lattice Rescoring","authors":"E. Arisoy, M. Saraçlar","doi":"10.1109/SIU.2006.1659773","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659773","url":null,"abstract":"In this paper, we have tried some language modelling approaches for large vocabulary continuous speech recognition (LVCSR) of Turkish. The agglutinative nature of Turkish makes Turkish a challenging language in terms of speech recognition since it is impossible to include all possible words in the recognition lexicon. Therefore, instead of using words as recognition units, we use a data-driven sub-word approach called morphs. This method was previously applied to Finnish, Estonian and Turkish and promising recognition results were achieved compared to words as recognition units. In our database, we obtained word error rates (WER) of 38.8% for the baseline word-based model and 33.9% for the baseline morph-based model. In addition, we tried some new methods. Recognition lattice outputs of each model were rescored with the root-based and root-class-based models for the word-based case and first-morph-based model for the morph-based case. The word-root composition approach achieves a 0.5% increase in the recognition performance. However, other two approaches fail due to the non-robust estimates over the baseline models","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"81 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124018144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659716
F. Kahraman, B. Kurt, M. Gokmen
Fundamental difficulty in face recognition systems is mainly related to successful human face alignment from the input image. In recent years, model based approaches get attention among others. Most powerful method among model-based approaches is known as active appearance model. The method accomplishes this by constructing a relation between shape and texture. Face alignment methods are required to work well even in the presence of illumination and affine transformation. Classical AAM extracts texture and shape information from the training image by using RGB color space. Classical AAM can only handle images having the same or similar color distribution to the images in the training set. Classical AAM cannot align images obtained under different lightning conditions from the training images even if the same person exists in the training database. In this study, we propose to use features which are shown to be less sensitive to illumination changes instead of directly using RGB colors. The proposed AAM is called three-band AAM. The bands are hue, hill, and luminance. Prominent edge detection constitutes the most important part of the model. Experimental studies show that prominent edges do not depend on illumination changes much when compared the original color space, and the three-band AAM based face alignment outperforms the classical AAM in terms of alignment precision
{"title":"Three-Band Modeling Using Prominent Edges for Face Alignment","authors":"F. Kahraman, B. Kurt, M. Gokmen","doi":"10.1109/SIU.2006.1659716","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659716","url":null,"abstract":"Fundamental difficulty in face recognition systems is mainly related to successful human face alignment from the input image. In recent years, model based approaches get attention among others. Most powerful method among model-based approaches is known as active appearance model. The method accomplishes this by constructing a relation between shape and texture. Face alignment methods are required to work well even in the presence of illumination and affine transformation. Classical AAM extracts texture and shape information from the training image by using RGB color space. Classical AAM can only handle images having the same or similar color distribution to the images in the training set. Classical AAM cannot align images obtained under different lightning conditions from the training images even if the same person exists in the training database. In this study, we propose to use features which are shown to be less sensitive to illumination changes instead of directly using RGB colors. The proposed AAM is called three-band AAM. The bands are hue, hill, and luminance. Prominent edge detection constitutes the most important part of the model. Experimental studies show that prominent edges do not depend on illumination changes much when compared the original color space, and the three-band AAM based face alignment outperforms the classical AAM in terms of alignment precision","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124019824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659715
H. Bayram, A. Ertuzun, H. Bozma
In signal processing and control applications, on-line state estimation plays important role in stability of the system. In cases where state and/or measurement functions are highly nonlinear and/or the noise is not Gaussian, conventional filters such as extended Kalman filters do not provide satisfactory results. In this paper, particle filters and its application to a nonlinear problem are examined
{"title":"Reduction of Sensory Inaccuracy in Nonlinear Systems using Particle Filters","authors":"H. Bayram, A. Ertuzun, H. Bozma","doi":"10.1109/SIU.2006.1659715","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659715","url":null,"abstract":"In signal processing and control applications, on-line state estimation plays important role in stability of the system. In cases where state and/or measurement functions are highly nonlinear and/or the noise is not Gaussian, conventional filters such as extended Kalman filters do not provide satisfactory results. In this paper, particle filters and its application to a nonlinear problem are examined","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126104334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659916
D. Kern, H. K. Ekenel, R. Stiefelhagen, Aydinlanmadan Kaynaklanan
In this paper a face recognition system that is based on illumination subspaces is presented. In this system, first, the dominant illumination directions are learned using a clustering algorithm. Three main illumination directions are observed: Ones that have frontal illumination, illumination from left and right sides. After determining the dominant illumination direction classes, the face space is divided into these classes to separate the variations caused by illumination from the variations caused by different identities. Then illumination subspaces based face recognition approach is used to benefit from the additional knowledge of the illumination direction. The proposed approach is tested on the images from the illumination and lighting subsets of the CMU PIE database. The experimental results show that by utilizing knowledge of illumination direction and using illumination subspaces based face recognition, the performance is significantly improved
{"title":"Illumination Subspaces based Robust Face Recognition","authors":"D. Kern, H. K. Ekenel, R. Stiefelhagen, Aydinlanmadan Kaynaklanan","doi":"10.1109/SIU.2006.1659916","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659916","url":null,"abstract":"In this paper a face recognition system that is based on illumination subspaces is presented. In this system, first, the dominant illumination directions are learned using a clustering algorithm. Three main illumination directions are observed: Ones that have frontal illumination, illumination from left and right sides. After determining the dominant illumination direction classes, the face space is divided into these classes to separate the variations caused by illumination from the variations caused by different identities. Then illumination subspaces based face recognition approach is used to benefit from the additional knowledge of the illumination direction. The proposed approach is tested on the images from the illumination and lighting subsets of the CMU PIE database. The experimental results show that by utilizing knowledge of illumination direction and using illumination subspaces based face recognition, the performance is significantly improved","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128448426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659783
Yanki Iptali, Ahmet Refit, Kavsaoglu Yuksel Ozbay
Because echo is formed through picking up sounds from the loud speaker by speaker microphone, echo noise in voice transmission systems severely degrades the speech quality and the speech intelligibility of speech signal. These picked sounds are sent to loudspeaker again. As a result of repetition of this process, the amplitude of the echo increases. In the study, it is aimed to enhance the intelligibility of speech by canceling out the echo noise. In this study, the data transfer software which is necessary for real time processing of voice signals and the adaptive filtering algorithm software for the application of acoustic echo cancellation have been developed
{"title":"Acoustic Echo Cancellation with Adaptive Filtering","authors":"Yanki Iptali, Ahmet Refit, Kavsaoglu Yuksel Ozbay","doi":"10.1109/SIU.2006.1659783","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659783","url":null,"abstract":"Because echo is formed through picking up sounds from the loud speaker by speaker microphone, echo noise in voice transmission systems severely degrades the speech quality and the speech intelligibility of speech signal. These picked sounds are sent to loudspeaker again. As a result of repetition of this process, the amplitude of the echo increases. In the study, it is aimed to enhance the intelligibility of speech by canceling out the echo noise. In this study, the data transfer software which is necessary for real time processing of voice signals and the adaptive filtering algorithm software for the application of acoustic echo cancellation have been developed","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128548468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-04-17DOI: 10.1109/SIU.2006.1659819
A. Aydemir, A. Uneri
Measuring horizontal, vertical and torsional nystagmus are performed using various methods. Videonystagmography (VNG), being one of these methods, is based on recording the high phase movements of the eye, which occurs during nystagmus. For this research, an infrared camera was employed to measure the three components of nystagmus on different patients, using image-processing algorithms
{"title":"Detection and Analysis of Quick Phase Eye Movements in Nystagmus (VNG)","authors":"A. Aydemir, A. Uneri","doi":"10.1109/SIU.2006.1659819","DOIUrl":"https://doi.org/10.1109/SIU.2006.1659819","url":null,"abstract":"Measuring horizontal, vertical and torsional nystagmus are performed using various methods. Videonystagmography (VNG), being one of these methods, is based on recording the high phase movements of the eye, which occurs during nystagmus. For this research, an infrared camera was employed to measure the three components of nystagmus on different patients, using image-processing algorithms","PeriodicalId":415037,"journal":{"name":"2006 IEEE 14th Signal Processing and Communications Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130140230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}