Davi Miara Kiapuchinski, C. Lima, Celso A. A. Kaestner
This paper proposes an approach for audio preprocessing and noise removal from recordings obtained in natural environments. The method is inspired in the acoustic signature of the audio, and aims to preprocess the recordings of bird songs obtained directly in the field. Using the Spectral Noise Gate technique, the undesired noise is removed on a real application in real time during the recording using an embedded environment. In addition, important statistic features of the audio signal are computed. The main purpose on approach is to eliminate the manual and tedious process of preparing the audio recordings done in the field in order to make them ready to be used as input in other tasks, such as the automatic classification of bird species from recorded bird songs. This is necessary because classification results depend widely from the quality of the input data.
{"title":"Spectral Noise Gate Technique Applied to Birdsong Preprocessing on Embedded Unit","authors":"Davi Miara Kiapuchinski, C. Lima, Celso A. A. Kaestner","doi":"10.1109/ISM.2012.12","DOIUrl":"https://doi.org/10.1109/ISM.2012.12","url":null,"abstract":"This paper proposes an approach for audio preprocessing and noise removal from recordings obtained in natural environments. The method is inspired in the acoustic signature of the audio, and aims to preprocess the recordings of bird songs obtained directly in the field. Using the Spectral Noise Gate technique, the undesired noise is removed on a real application in real time during the recording using an embedded environment. In addition, important statistic features of the audio signal are computed. The main purpose on approach is to eliminate the manual and tedious process of preparing the audio recordings done in the field in order to make them ready to be used as input in other tasks, such as the automatic classification of bird species from recorded bird songs. This is necessary because classification results depend widely from the quality of the input data.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129378390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahesh Babu Mariappan, Myunghoon Suk, B. Prabhakaran
In this paper, we present our approach for automatic facial expression recognition. We use a feature extraction technique inspired by our empirical study on human recognition of facial expressions. We propose our dual-layer hierarchical SVM ensemble mechanism for classification. We also provide system architecture and system implementation details in this paper.
{"title":"Facial Expression Recognition Using Dual Layer Hierarchical SVM Ensemble Classification","authors":"Mahesh Babu Mariappan, Myunghoon Suk, B. Prabhakaran","doi":"10.1109/ISM.2012.104","DOIUrl":"https://doi.org/10.1109/ISM.2012.104","url":null,"abstract":"In this paper, we present our approach for automatic facial expression recognition. We use a feature extraction technique inspired by our empirical study on human recognition of facial expressions. We propose our dual-layer hierarchical SVM ensemble mechanism for classification. We also provide system architecture and system implementation details in this paper.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129666203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A face tracking and recognition system has been developed based on deformable template matching [1]. Person-dependent deformable templates are used for recognition, and person-independent deformable templates for tracking. The computational load associated with recognition is greater than that associated with tracking, because the number of person-dependent templates that must be deformed and matched against the input frame is equal to the number of registered individuals, whereas there is only a single person-independent template (per pose cell) for tracking. In this work, we show how person-independent templates can be used for recognition as well as tracking, resulting in a substantial reduction in the computation associated with recognition in the system of [1] (and potentially by extension in similar systems), at relatively small cost in recognition performance.
{"title":"Person-Independent Deformable Templates for Fast Face Recognition","authors":"S. Clippingdale, Mahito Fujii","doi":"10.1109/ISM.2012.89","DOIUrl":"https://doi.org/10.1109/ISM.2012.89","url":null,"abstract":"A face tracking and recognition system has been developed based on deformable template matching [1]. Person-dependent deformable templates are used for recognition, and person-independent deformable templates for tracking. The computational load associated with recognition is greater than that associated with tracking, because the number of person-dependent templates that must be deformed and matched against the input frame is equal to the number of registered individuals, whereas there is only a single person-independent template (per pose cell) for tracking. In this work, we show how person-independent templates can be used for recognition as well as tracking, resulting in a substantial reduction in the computation associated with recognition in the system of [1] (and potentially by extension in similar systems), at relatively small cost in recognition performance.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel approach for multi-feature information fusion. The proposed method is based on the Discriminative Multiple Canonical Correlation Analysis (DMCCA), which can extract more discriminative characteristics for recognition from multi-feature information representation. It represents the different patterns among multiple subsets of features identified by minimizing the Frobenius norm. We will demonstrate that the Canonical Correlation Analysis (CCA), the Multiple Canonical Correlation Analysis (MCCA), and the Discriminative Canonical Correlation Analysis (DCCA) are special cases of the DMCCA. The effectiveness of the DMCCA is demonstrated through experimentation in speaker recognition and speech-based emotion recognition. Experimental results show that the proposed approach outperforms the traditional methods of serial fusion, CCA, MCCA and DCCA.
{"title":"Discriminative Multiple Canonical Correlation Analysis for Multi-feature Information Fusion","authors":"Lei Gao, L. Qi, E. Chen, L. Guan","doi":"10.1109/ISM.2012.15","DOIUrl":"https://doi.org/10.1109/ISM.2012.15","url":null,"abstract":"This paper presents a novel approach for multi-feature information fusion. The proposed method is based on the Discriminative Multiple Canonical Correlation Analysis (DMCCA), which can extract more discriminative characteristics for recognition from multi-feature information representation. It represents the different patterns among multiple subsets of features identified by minimizing the Frobenius norm. We will demonstrate that the Canonical Correlation Analysis (CCA), the Multiple Canonical Correlation Analysis (MCCA), and the Discriminative Canonical Correlation Analysis (DCCA) are special cases of the DMCCA. The effectiveness of the DMCCA is demonstrated through experimentation in speaker recognition and speech-based emotion recognition. Experimental results show that the proposed approach outperforms the traditional methods of serial fusion, CCA, MCCA and DCCA.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122730499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Winkler, Kai Michael Höver, Aristotelis Hadjakos, M. Mühlhäuser
Today, talks, presentations, and lectures are often captured on video to give a broad audience the possibility to (re-)access the content. As presenters are often moving around during a talk it is necessary to guide recording cameras. We present an automatic solution for user tracking and camera control. It uses a depth camera for user tracking, and a scalable networking architecture based on publish/subscribe messaging for controlling multiple video cameras. Furthermore, we present our experiences with the system during actual lectures at an university.
{"title":"Automatic Camera Control for Tracking a Presenter during a Talk","authors":"M. Winkler, Kai Michael Höver, Aristotelis Hadjakos, M. Mühlhäuser","doi":"10.1109/ISM.2012.96","DOIUrl":"https://doi.org/10.1109/ISM.2012.96","url":null,"abstract":"Today, talks, presentations, and lectures are often captured on video to give a broad audience the possibility to (re-)access the content. As presenters are often moving around during a talk it is necessary to guide recording cameras. We present an automatic solution for user tracking and camera control. It uses a depth camera for user tracking, and a scalable networking architecture based on publish/subscribe messaging for controlling multiple video cameras. Furthermore, we present our experiences with the system during actual lectures at an university.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126368238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inaccurate image segmentation often has a negative impact on object-based image retrieval. Researchers have attempted to alleviate this problem by using hierarchical image representation. However, these attempts suffer from the inefficiency in building the hierarchical image representation and the high computational complexity in matching two hierarchically represented images. Existing approaches construct the hierarchical image representation in two steps. The first step is to perform segmentation at different image resolutions, and the second step is to construct a hierarchical representation of the image by associating segments from different resolutions. In this research, an innovative all-in-one run approach is proposed that concurrently performs image segmentation and hierarchical tree construction, producing a hierarchical region tree to represent the image. In addition, an efficient hierarchical region tree matching algorithm is proposed with a reasonably low time complexity and used in multiple object image retrieval. The experimental results demonstrate the efficacy and efficiency of the proposed approach.
{"title":"Segmentation Tree Based Multiple Object Image Retrieval","authors":"Wei-bang Chen, Chengcui Zhang, Song Gao","doi":"10.1109/ISM.2012.49","DOIUrl":"https://doi.org/10.1109/ISM.2012.49","url":null,"abstract":"Inaccurate image segmentation often has a negative impact on object-based image retrieval. Researchers have attempted to alleviate this problem by using hierarchical image representation. However, these attempts suffer from the inefficiency in building the hierarchical image representation and the high computational complexity in matching two hierarchically represented images. Existing approaches construct the hierarchical image representation in two steps. The first step is to perform segmentation at different image resolutions, and the second step is to construct a hierarchical representation of the image by associating segments from different resolutions. In this research, an innovative all-in-one run approach is proposed that concurrently performs image segmentation and hierarchical tree construction, producing a hierarchical region tree to represent the image. In addition, an efficient hierarchical region tree matching algorithm is proposed with a reasonably low time complexity and used in multiple object image retrieval. The experimental results demonstrate the efficacy and efficiency of the proposed approach.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132839000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Looking for a piano sheet music with proper difficulty for a piano learner is always an important work to his/her teacher. In the paper, we study on a new and challenging issue of recognizing the difficulty level of piano sheet music. To analyze the semantic content of music, we focus on symbolic music, i.e., sheet music or score. Specifically, difficulty level recognition is formulated as a regression problem to predict the difficulty level of piano sheet music. Since the existing symbolic music features are not able to capture the characteristics of difficulty, we propose a set of new features. To improve the performance, a feature selection approach, RReliefF, is used to select relevant features. An extensive performance study is conducted over two real datasets with different characteristics to evaluate the accuracy of the regression approach for predicting difficulty level. The best performance evaluated in terms of the R2 statistics over two datasets reaches 39.9% and 38.8%, respectively.
{"title":"A Study on Difficulty Level Recognition of Piano Sheet Music","authors":"Shih-Chuan Chiu, Min-Syan Chen","doi":"10.1109/ISM.2012.11","DOIUrl":"https://doi.org/10.1109/ISM.2012.11","url":null,"abstract":"Looking for a piano sheet music with proper difficulty for a piano learner is always an important work to his/her teacher. In the paper, we study on a new and challenging issue of recognizing the difficulty level of piano sheet music. To analyze the semantic content of music, we focus on symbolic music, i.e., sheet music or score. Specifically, difficulty level recognition is formulated as a regression problem to predict the difficulty level of piano sheet music. Since the existing symbolic music features are not able to capture the characteristics of difficulty, we propose a set of new features. To improve the performance, a feature selection approach, RReliefF, is used to select relevant features. An extensive performance study is conducted over two real datasets with different characteristics to evaluate the accuracy of the regression approach for predicting difficulty level. The best performance evaluated in terms of the R2 statistics over two datasets reaches 39.9% and 38.8%, respectively.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128120236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Karl, Tatiana Polishchuk, T. Herfet, A. Gurtov
Internet multimedia traffic currently occupies more than half of the total Internet traffic and it continues to expand tremendously. Targeting to meet strict constraints imposed by the requirements of real-time multimedia applications appropriate error-correction techniques should be implemented within the data dissemination network. We propose to introduce multipurpose relay nodes called Mediators into several positions within the tree networks typical for multicasting and broadcasting scenarios. By utilizing the error-correction domain separation paradigm in combination with selective insertion of the supplementary data from parallel networks, when the corresponding content is available, the proposed mechanism reduces the total network load and improves scalability of multicast/broadcast transmission. We share our view on how the existing application frameworks could benefit from the incremental deployment of the proposed mechanism. Experimental results confirm suitability and applicability of our assumptions.
{"title":"Mediating Multimedia Traffic with Strict Delivery Constraints","authors":"Michael Karl, Tatiana Polishchuk, T. Herfet, A. Gurtov","doi":"10.1109/ISM.2012.53","DOIUrl":"https://doi.org/10.1109/ISM.2012.53","url":null,"abstract":"Internet multimedia traffic currently occupies more than half of the total Internet traffic and it continues to expand tremendously. Targeting to meet strict constraints imposed by the requirements of real-time multimedia applications appropriate error-correction techniques should be implemented within the data dissemination network. We propose to introduce multipurpose relay nodes called Mediators into several positions within the tree networks typical for multicasting and broadcasting scenarios. By utilizing the error-correction domain separation paradigm in combination with selective insertion of the supplementary data from parallel networks, when the corresponding content is available, the proposed mechanism reduces the total network load and improves scalability of multicast/broadcast transmission. We share our view on how the existing application frameworks could benefit from the incremental deployment of the proposed mechanism. Experimental results confirm suitability and applicability of our assumptions.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121451519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Gao, L. Qi, Shou-yi Yang, Yongjin Wang, Tie Yun, L. Guan
The extraction of rotation invariant representation is important for many signal processing problems such as image analysis, computer vision, and pattern recognition. In this paper, we present a systematic analysis of the Two-Dimensional Fractional Fourier Transform (2D-FRFT), and show that under certain conditions, the 2D-FRFT technique possesses the attractive property of rotation invariance. Based on our analysis, we proposed a novel digital image watermarking method which combines 2D chirp signal with the addition and rotation invariant properties of 2D-FRFT to achieve improved robustness and security. The effectiveness of the proposed solution is demonstrated through experiments.
{"title":"2D-FRFT Based Rotation Invariant Digital Image Watermarking","authors":"Lei Gao, L. Qi, Shou-yi Yang, Yongjin Wang, Tie Yun, L. Guan","doi":"10.1109/ISM.2012.60","DOIUrl":"https://doi.org/10.1109/ISM.2012.60","url":null,"abstract":"The extraction of rotation invariant representation is important for many signal processing problems such as image analysis, computer vision, and pattern recognition. In this paper, we present a systematic analysis of the Two-Dimensional Fractional Fourier Transform (2D-FRFT), and show that under certain conditions, the 2D-FRFT technique possesses the attractive property of rotation invariance. Based on our analysis, we proposed a novel digital image watermarking method which combines 2D chirp signal with the addition and rotation invariant properties of 2D-FRFT to achieve improved robustness and security. The effectiveness of the proposed solution is demonstrated through experiments.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"598 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an efficient image alignment and image enhancement method for multi-sensor images. The shape of the object captured in one of the multi-sensor images can be found by similar edges but different contrasts in the other multi-sensor image. Using this cue, our approach is based on the magnitudes of the oriented edges and results in the fast alignment method by feature matching between multi-sensor images. To enhance the image with aligned multi-sensor images, we estimate a salient region mask which covers the information of all input images. Our experimental results show that our proposed method can efficiently align multi-sensor images and enhance them better than the current methods.
{"title":"Feature-Based Multi-sensor Images Alignment and Enhancement","authors":"Myung-Ho Ju, Sung-Yong Kim, Hang-Bong Kang","doi":"10.1109/ISM.2012.45","DOIUrl":"https://doi.org/10.1109/ISM.2012.45","url":null,"abstract":"This paper presents an efficient image alignment and image enhancement method for multi-sensor images. The shape of the object captured in one of the multi-sensor images can be found by similar edges but different contrasts in the other multi-sensor image. Using this cue, our approach is based on the magnitudes of the oriented edges and results in the fast alignment method by feature matching between multi-sensor images. To enhance the image with aligned multi-sensor images, we estimate a salient region mask which covers the information of all input images. Our experimental results show that our proposed method can efficiently align multi-sensor images and enhance them better than the current methods.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126987450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}