Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041589
Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki
This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.
{"title":"Selection of best match keyword using spoken term detection for spoken document indexing","authors":"Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki","doi":"10.1109/APSIPA.2014.7041589","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041589","url":null,"abstract":"This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123869677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041778
Li-Wei Kang, C. Yeh, Duan-Yu Chen, Chia-Tsung Lin
Decomposition of a signal (e.g., image or video) into multiple semantic components has been an effective research topic for various image/video processing applications, such as image/video denoising, enhancement, and inpainting. In this paper, we present a survey of signal decomposition frameworks based on the uses of sparsity and morphological diversity in signal mixtures and its applications in multimedia. First, we analyze existing MCA (morphological component analysis) based image decomposition frameworks with their applications and explore the potential limitations of these approaches for image denoising. Then, we discuss our recently proposed self-learning based image decomposition framework with its applications to several image/video denoising tasks, including single image rain streak removal, denoising, deblocking, joint super-resolution and deblocking for a highly compressed image/video. By advancing sparse representation and morphological diversity of image signals, the proposed framework first learns an over-complete dictionary from the high frequency part of an input image for reconstruction purposes. An unsupervised or supervised clustering technique is applied to the dictionary atoms for identifying the morphological component corresponding to the noise pattern of interest (e.g., rain streaks, blocking artifacts, or Gaussian noises). Different from prior learning-based approaches, our method does not need to collect training data in advance and no image priors are required. Our experimental results have confirmed the effectiveness and robustness of the proposed framework, which has been shown to outperform state-of-the-art approaches.
{"title":"Self-learning-based signal decomposition for multimedia applications: A review and comparative study","authors":"Li-Wei Kang, C. Yeh, Duan-Yu Chen, Chia-Tsung Lin","doi":"10.1109/APSIPA.2014.7041778","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041778","url":null,"abstract":"Decomposition of a signal (e.g., image or video) into multiple semantic components has been an effective research topic for various image/video processing applications, such as image/video denoising, enhancement, and inpainting. In this paper, we present a survey of signal decomposition frameworks based on the uses of sparsity and morphological diversity in signal mixtures and its applications in multimedia. First, we analyze existing MCA (morphological component analysis) based image decomposition frameworks with their applications and explore the potential limitations of these approaches for image denoising. Then, we discuss our recently proposed self-learning based image decomposition framework with its applications to several image/video denoising tasks, including single image rain streak removal, denoising, deblocking, joint super-resolution and deblocking for a highly compressed image/video. By advancing sparse representation and morphological diversity of image signals, the proposed framework first learns an over-complete dictionary from the high frequency part of an input image for reconstruction purposes. An unsupervised or supervised clustering technique is applied to the dictionary atoms for identifying the morphological component corresponding to the noise pattern of interest (e.g., rain streaks, blocking artifacts, or Gaussian noises). Different from prior learning-based approaches, our method does not need to collect training data in advance and no image priors are required. Our experimental results have confirmed the effectiveness and robustness of the proposed framework, which has been shown to outperform state-of-the-art approaches.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116767970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041732
Seokhwan Kim, Rafael E. Banchs
This paper describes a hybrid dialogue system for restaurant recommendation and reservation. The proposed system combines rule-based and data-driven components by using a flexible architecture aiming at diminishing error propagation along the different steps of the dialogue management and processing pipeline. The system implements three basic subsystems for restaurant recommendation, selection and booking, which leverage on the same system architecture and processing components. The specific system described here operates with a data collection of Singapore's F&B industry but it can be easily adapted to any other city or location by simply replacing the used data collection.
{"title":"R-cube: A dialogue agent for restaurant recommendation and reservation","authors":"Seokhwan Kim, Rafael E. Banchs","doi":"10.1109/APSIPA.2014.7041732","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041732","url":null,"abstract":"This paper describes a hybrid dialogue system for restaurant recommendation and reservation. The proposed system combines rule-based and data-driven components by using a flexible architecture aiming at diminishing error propagation along the different steps of the dialogue management and processing pipeline. The system implements three basic subsystems for restaurant recommendation, selection and booking, which leverage on the same system architecture and processing components. The specific system described here operates with a data collection of Singapore's F&B industry but it can be easily adapted to any other city or location by simply replacing the used data collection.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122845742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041683
Yunseok Song, Dong-Won Shin, Eunsang Ko, Yo-Sung Ho
In this paper, we present a hybrid multi-view camera system for real-time depth generation. We set up eight color cameras and three depth cameras. For simple test scenarios, we capture a single object at a blue-screen studio. The objective is depth map generation at eight color viewpoints. Due to hardware limitations, depth cameras produce low resolution images, i.e., 176×144. Thus, we warp the depth data to the color cameras views (1280×720) and then execute filtering. Joint bilateral filtering (JBF) is used to exploit range and spatial weights, considering color data as well. Simulation results exhibit depth generation of 13 frames per second (fps) when treating eight images as a single frame. When the proposed method is executed on a computer per depth camera basis, the speed can become three times faster. Thus, we have successfully achieved real-time depth generation using a hybrid multi-view camera system.
{"title":"Real-time depth map generation using hybrid multi-view cameras","authors":"Yunseok Song, Dong-Won Shin, Eunsang Ko, Yo-Sung Ho","doi":"10.1109/APSIPA.2014.7041683","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041683","url":null,"abstract":"In this paper, we present a hybrid multi-view camera system for real-time depth generation. We set up eight color cameras and three depth cameras. For simple test scenarios, we capture a single object at a blue-screen studio. The objective is depth map generation at eight color viewpoints. Due to hardware limitations, depth cameras produce low resolution images, i.e., 176×144. Thus, we warp the depth data to the color cameras views (1280×720) and then execute filtering. Joint bilateral filtering (JBF) is used to exploit range and spatial weights, considering color data as well. Simulation results exhibit depth generation of 13 frames per second (fps) when treating eight images as a single frame. When the proposed method is executed on a computer per depth camera basis, the speed can become three times faster. Thus, we have successfully achieved real-time depth generation using a hybrid multi-view camera system.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131539563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041717
Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao
Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.
{"title":"Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system","authors":"Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao","doi":"10.1109/APSIPA.2014.7041717","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041717","url":null,"abstract":"Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"62 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131540250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041795
A. Saenthon, Natchanon Sukkhadamrongrak
Currently, the optical character recognition (OCR) is applied in many fields such as reading the office letter and to read the serial on parts of industrial. The most manufacturing focus the processing time and accuracy for inspection process. The learning method of the optical character recognition is used a neural network to recognize the fonts and correlation the matching value. The neural network has many learning techniques which each technique impact to the processing time and accuracy. Therefore, this paper studies to comparisons a suitable procedure of training in neural network for recognizing both Thai and English characters. The experiment results show the comparing values of error and processing time of each training technique.
{"title":"Comparison the training methods of neural network for English and Thai character recognition","authors":"A. Saenthon, Natchanon Sukkhadamrongrak","doi":"10.1109/APSIPA.2014.7041795","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041795","url":null,"abstract":"Currently, the optical character recognition (OCR) is applied in many fields such as reading the office letter and to read the serial on parts of industrial. The most manufacturing focus the processing time and accuracy for inspection process. The learning method of the optical character recognition is used a neural network to recognize the fonts and correlation the matching value. The neural network has many learning techniques which each technique impact to the processing time and accuracy. Therefore, this paper studies to comparisons a suitable procedure of training in neural network for recognizing both Thai and English characters. The experiment results show the comparing values of error and processing time of each training technique.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041546
Jui-Feng Yeh, C. Lee, Yi-Shiuan Tan, Liang-Chih Yu
The topic information of conversational content is important for continuation with communication, so topic detection and tracking is one of important research. Due to there are many topic transform occurring frequently in long time communication, and the conversation maybe have many topics, so it's important to detect different topics in conversational content. This paper detects topic information by using agglomerative clustering of utterances and Dynamic Latent Dirichlet Allocation topic model, uses proportion of verb and noun to analyze similarity between utterances and cluster all utterances in conversational content by agglomerative clustering algorithm. The topic structure of conversational content is friability, so we use speech act information and gets the hypernym information by E-HowNet that obtains robustness of word categories. Latent Dirichlet Allocation topic model is used to detect topic in file units, it just can detect only one topic if uses it in conversational content, because of there are many topics in conversational content frequently, and also uses speech act information and hypernym information to train the latent Dirichlet allocation models, then uses trained models to detect different topic information in conversational content. For evaluating the proposed method, support vector machine is developed for comparison. According to the experimental results, we can find the proposed method outperforms the approach based on support vector machine in topic detection and tracking in spoken dialogue.
{"title":"Topic model allocation of conversational dialogue records by Latent Dirichlet Allocation","authors":"Jui-Feng Yeh, C. Lee, Yi-Shiuan Tan, Liang-Chih Yu","doi":"10.1109/APSIPA.2014.7041546","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041546","url":null,"abstract":"The topic information of conversational content is important for continuation with communication, so topic detection and tracking is one of important research. Due to there are many topic transform occurring frequently in long time communication, and the conversation maybe have many topics, so it's important to detect different topics in conversational content. This paper detects topic information by using agglomerative clustering of utterances and Dynamic Latent Dirichlet Allocation topic model, uses proportion of verb and noun to analyze similarity between utterances and cluster all utterances in conversational content by agglomerative clustering algorithm. The topic structure of conversational content is friability, so we use speech act information and gets the hypernym information by E-HowNet that obtains robustness of word categories. Latent Dirichlet Allocation topic model is used to detect topic in file units, it just can detect only one topic if uses it in conversational content, because of there are many topics in conversational content frequently, and also uses speech act information and hypernym information to train the latent Dirichlet allocation models, then uses trained models to detect different topic information in conversational content. For evaluating the proposed method, support vector machine is developed for comparison. According to the experimental results, we can find the proposed method outperforms the approach based on support vector machine in topic detection and tracking in spoken dialogue.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115754286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041666
Keita Shimpo, Toshihisa Tanaka
A brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is one of the most practical BCI, because of high recognition accuracies and short time training. Phase of SSVEPs can be potentially applicable for generating device commands. However, the effective method of estimating the phase of SSVEPs has not yet been established, especially, in the case of using multi-channel electroencephalogram (EEG). In this paper, we propose a novel method for estimating the phase of SSVEPs from multi-channel EEG, which uses complex sparse spatial weighting. We conducted experiments with the phase-coded SSVEPs based BCI for evaluating performance of our proposed method. As a result, our proposed method showed higher recognition accuracies than conventional methods in all six subjects.
{"title":"Phase detection of multi-channel SSVEPs via complex sparse spatial weighting","authors":"Keita Shimpo, Toshihisa Tanaka","doi":"10.1109/APSIPA.2014.7041666","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041666","url":null,"abstract":"A brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is one of the most practical BCI, because of high recognition accuracies and short time training. Phase of SSVEPs can be potentially applicable for generating device commands. However, the effective method of estimating the phase of SSVEPs has not yet been established, especially, in the case of using multi-channel electroencephalogram (EEG). In this paper, we propose a novel method for estimating the phase of SSVEPs from multi-channel EEG, which uses complex sparse spatial weighting. We conducted experiments with the phase-coded SSVEPs based BCI for evaluating performance of our proposed method. As a result, our proposed method showed higher recognition accuracies than conventional methods in all six subjects.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124638003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041514
T. Pham
The description of information content in images is imprecise in nature. Quantification of uncertainty in images for pattern analysis has been addressed with the theories of probability and fuzzy sets. In this paper, an approach for modeling the spatial uncertainty of images is proposed in the setting of geostatistics and probability measure of fuzzy events. The proposed approach can be utilized to extract an effective feature for image classification.
{"title":"Modeling spatial uncertainty of imprecise information in images","authors":"T. Pham","doi":"10.1109/APSIPA.2014.7041514","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041514","url":null,"abstract":"The description of information content in images is imprecise in nature. Quantification of uncertainty in images for pattern analysis has been addressed with the theories of probability and fuzzy sets. In this paper, an approach for modeling the spatial uncertainty of images is proposed in the setting of geostatistics and probability measure of fuzzy events. The proposed approach can be utilized to extract an effective feature for image classification.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134376025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041563
C. Phromsuthirak, W. Tangsuksant, A. Sanpanich, C. Pintavirooj
A Palmprint, biométrie characteristics, was mostly found in civil and commercial applications for security system because it has more reliable and easy to capture by low resolution devices. This paper was to develop a new contactless palmprint alignment with general USB camera on tripod. The palmprint image is acquired by this camera and using intrinsic local affine-invariant key points residing on the area patches spanning between two successive fingers to align palmprint image. The key points are relative affine invariant to affine transformations so this algorithm does not need the guidance pegs in acquisition process to fix hand position to avoid the scaling, translation and rotation problems for correctly palmprint image alignment. Finally, the developed algorithm was tested by 10 left-handed palmprint images collected from different subjects. The simulation results indicate by distance map error of 1.4899 pixels.
{"title":"Contactless palmprint alignment based on intrinsic local affine-invariant feature points","authors":"C. Phromsuthirak, W. Tangsuksant, A. Sanpanich, C. Pintavirooj","doi":"10.1109/APSIPA.2014.7041563","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041563","url":null,"abstract":"A Palmprint, biométrie characteristics, was mostly found in civil and commercial applications for security system because it has more reliable and easy to capture by low resolution devices. This paper was to develop a new contactless palmprint alignment with general USB camera on tripod. The palmprint image is acquired by this camera and using intrinsic local affine-invariant key points residing on the area patches spanning between two successive fingers to align palmprint image. The key points are relative affine invariant to affine transformations so this algorithm does not need the guidance pegs in acquisition process to fix hand position to avoid the scaling, translation and rotation problems for correctly palmprint image alignment. Finally, the developed algorithm was tested by 10 left-handed palmprint images collected from different subjects. The simulation results indicate by distance map error of 1.4899 pixels.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131795318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}