Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168185
C. Ge, I. Gu, Jie Yang
This paper addresses issues in human fall detection from videos. Unlike using handcrafted features in the conventional machine learning, we extract features from Convolutional Neural Networks (CNNs) for human fall detection. Similar to many existing work using two stream inputs, we use a spatial CNN stream with raw image difference and a temporal CNN stream with optical flow as the inputs of CNN. Different from conventional two stream action recognition work, we exploit sparse representation with residual-based pooling on the CNN extracted features, for obtaining more discriminative feature codes. For characterizing the sequential information in video activity, we use the code vector from long-range dynamic feature representation by concatenating codes in segment-levels as the input to a SVM classifier. Experiments have been conducted on two public video databases for fall detection. Comparisons with six existing methods show the effectiveness of the proposed method.
{"title":"Human fall detection using segment-level cnn features and sparse dictionary learning","authors":"C. Ge, I. Gu, Jie Yang","doi":"10.1109/MLSP.2017.8168185","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168185","url":null,"abstract":"This paper addresses issues in human fall detection from videos. Unlike using handcrafted features in the conventional machine learning, we extract features from Convolutional Neural Networks (CNNs) for human fall detection. Similar to many existing work using two stream inputs, we use a spatial CNN stream with raw image difference and a temporal CNN stream with optical flow as the inputs of CNN. Different from conventional two stream action recognition work, we exploit sparse representation with residual-based pooling on the CNN extracted features, for obtaining more discriminative feature codes. For characterizing the sequential information in video activity, we use the code vector from long-range dynamic feature representation by concatenating codes in segment-levels as the input to a SVM classifier. Experiments have been conducted on two public video databases for fall detection. Comparisons with six existing methods show the effectiveness of the proposed method.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"17 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77713093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168135
D. Fantinato, A. Neves, D. G. Silva, R. Attux
In communication systems, the study of elements and structures defined over Galois fields are generally limited to data coding. However, in this work, a novel perspective that combines data coding and channel equalization is considered to compose a simplified communication system over the field. Besides the coding advantages, this framework is able to restore distortions or malfunctioning processes, and can be potentially applied in network coding models. Interestingly, the operation of the equalizer is possible from a blind standpoint through the exploration of the redundant information introduced by the encoder. More specifically, we define a blind equalization criterion based on the matching of probability mass functions (PMFs) via the Kullback-Leibler divergence. Simulations involving the main aspects of the equalizer and the criterion are performed, including the use of a genetic algorithm to aid the search for the solution, with promising results.
{"title":"Blind channel equalization of encoded data over galois fields","authors":"D. Fantinato, A. Neves, D. G. Silva, R. Attux","doi":"10.1109/MLSP.2017.8168135","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168135","url":null,"abstract":"In communication systems, the study of elements and structures defined over Galois fields are generally limited to data coding. However, in this work, a novel perspective that combines data coding and channel equalization is considered to compose a simplified communication system over the field. Besides the coding advantages, this framework is able to restore distortions or malfunctioning processes, and can be potentially applied in network coding models. Interestingly, the operation of the equalizer is possible from a blind standpoint through the exploration of the redundant information introduced by the encoder. More specifically, we define a blind equalization criterion based on the matching of probability mass functions (PMFs) via the Kullback-Leibler divergence. Simulations involving the main aspects of the equalizer and the criterion are performed, including the use of a genetic algorithm to aid the search for the solution, with promising results.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"158 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86347607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168187
Karl Øyvind Mikalsen, F. Bianchi, C. Soguero-Ruíz, R. Jenssen
This paper presents the time series cluster kernel (TCK) for multivariate time series with missing data. Our approach leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with empirical prior distributions. Further, we exploit an ensemble learning approach to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. In comparative experiments, we demonstrate that the TCK is robust to parameter choices and illustrate its capabilities of dealing with multivariate time series, both with and without missing data.
{"title":"The time series cluster kernel","authors":"Karl Øyvind Mikalsen, F. Bianchi, C. Soguero-Ruíz, R. Jenssen","doi":"10.1109/MLSP.2017.8168187","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168187","url":null,"abstract":"This paper presents the time series cluster kernel (TCK) for multivariate time series with missing data. Our approach leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with empirical prior distributions. Further, we exploit an ensemble learning approach to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. In comparative experiments, we demonstrate that the TCK is robust to parameter choices and illustrate its capabilities of dealing with multivariate time series, both with and without missing data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"58 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80008304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168186
Alan Joseph Bekker, M. Chorev, L. Carmel, J. Goldberger
An appreciable fraction of introns is thought to be involved in cellular functions, but there is no obvious way to predict which specific intron is likely to be functional. For each intron we are given a feature representation that is based on its evolutionary patterns. For a small subsets of introns we are also given an indication that they are functional. For all other introns it is not known whether they are functional or not. Our task is to estimate what fraction of introns are functional and, how likely it is that each individual intron is functional. We define a probabilistic classification model that treats the given functionality labels as noisy versions of labels created by a Deep Neural Network model. The maximum-likelihood model parameters are found by utilizing the Expectation-Maximization algorithm. We show that roughly 80% of the functional introns are still not recognized as such, and that roughly a third of all introns are functional.
{"title":"A deep neural network witharestricted noisy channel for identification of functional introns","authors":"Alan Joseph Bekker, M. Chorev, L. Carmel, J. Goldberger","doi":"10.1109/MLSP.2017.8168186","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168186","url":null,"abstract":"An appreciable fraction of introns is thought to be involved in cellular functions, but there is no obvious way to predict which specific intron is likely to be functional. For each intron we are given a feature representation that is based on its evolutionary patterns. For a small subsets of introns we are also given an indication that they are functional. For all other introns it is not known whether they are functional or not. Our task is to estimate what fraction of introns are functional and, how likely it is that each individual intron is functional. We define a probabilistic classification model that treats the given functionality labels as noisy versions of labels created by a Deep Neural Network model. The maximum-likelihood model parameters are found by utilizing the Expectation-Maximization algorithm. We show that roughly 80% of the functional introns are still not recognized as such, and that roughly a third of all introns are functional.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"40 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77600545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168176
K. Mentl, B. Mailhé, Florin C. Ghesu, Frank Schebesch, T. Haderlein, A. Maier, M. Nadar
This article presents a novel neural network-based approach for enhancement of 3D medical image data. The proposed networks learn a sparse representation basis by mapping the corrupted input data to corresponding optimal targets. To reinforce the adjustment of the network to the given data, the threshold values are also adaptively learned. In order to capture important image features on various scales and be able to process large computed tomography (CT) volumes in a reasonable time, a multiscale approach is applied. Recursively downsampled versions of the input are used and denoising operator of constant size are learnt at each scale. The networks are trained end-to-end from a database of real highdose acquisitions with synthetic additional noise to simulate the corresponding low-dose scans. Both 2D and 3D networks are evaluated on CT volumes and compared to the block-matching and 3D filtering (BM3D) algorithm. The presented methods achieve an increase of 4% to 11% in the SSIM and of 2.4 to 2.8 dB in the PSNR with respect to the ground truth, outperform BM3D in quantitative comparisions and present no visible texture artifacts. By exploiting volumetric information, 3D networks achieve superior results over 2D networks.
{"title":"Noise reduction in low-dose ct using a 3D multiscale sparse denoising autoencoder","authors":"K. Mentl, B. Mailhé, Florin C. Ghesu, Frank Schebesch, T. Haderlein, A. Maier, M. Nadar","doi":"10.1109/MLSP.2017.8168176","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168176","url":null,"abstract":"This article presents a novel neural network-based approach for enhancement of 3D medical image data. The proposed networks learn a sparse representation basis by mapping the corrupted input data to corresponding optimal targets. To reinforce the adjustment of the network to the given data, the threshold values are also adaptively learned. In order to capture important image features on various scales and be able to process large computed tomography (CT) volumes in a reasonable time, a multiscale approach is applied. Recursively downsampled versions of the input are used and denoising operator of constant size are learnt at each scale. The networks are trained end-to-end from a database of real highdose acquisitions with synthetic additional noise to simulate the corresponding low-dose scans. Both 2D and 3D networks are evaluated on CT volumes and compared to the block-matching and 3D filtering (BM3D) algorithm. The presented methods achieve an increase of 4% to 11% in the SSIM and of 2.4 to 2.8 dB in the PSNR with respect to the ground truth, outperform BM3D in quantitative comparisions and present no visible texture artifacts. By exploiting volumetric information, 3D networks achieve superior results over 2D networks.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87680203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168145
E. Nadimi, V. Blanes-Vidal
In this paper, we regarded an absorbing inhomogeneous medium as an assembly of thin layers having different propagation properties. We derived a stochastic model for the refractive index and formulated the localisation problem given noisy distance measurements using graph realisation problem. We relaxed the problem using semi-definite programming (SDP) approach in lp realisation domain and derived upper bounds that follow Edmundson-Madansky bound of order 6p (EM6p) on the SDP objective function to provide an estimation of the techniques' localisation accuracy. Our results showed that the inhomogeneity of the media and the choice of lp norm have significant impact on the ratio of the expected value of the localisation error to the upper bound for the expected optimal SDP objective value. The tightest ratio was derived when l∞ norm was used.
{"title":"Upper bound performance of semi-definite programming for localisation in inhomogeneous media","authors":"E. Nadimi, V. Blanes-Vidal","doi":"10.1109/MLSP.2017.8168145","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168145","url":null,"abstract":"In this paper, we regarded an absorbing inhomogeneous medium as an assembly of thin layers having different propagation properties. We derived a stochastic model for the refractive index and formulated the localisation problem given noisy distance measurements using graph realisation problem. We relaxed the problem using semi-definite programming (SDP) approach in lp realisation domain and derived upper bounds that follow Edmundson-Madansky bound of order 6p (EM6p) on the SDP objective function to provide an estimation of the techniques' localisation accuracy. Our results showed that the inhomogeneity of the media and the choice of lp norm have significant impact on the ratio of the expected value of the localisation error to the upper bound for the expected optimal SDP objective value. The tightest ratio was derived when l∞ norm was used.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90588609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168106
Tran Duy Linh, Masayuki Arai
In the present paper, we propose a deep network architecture in order to improve the accuracy of pedestrian detection. The proposed method contains a proposal network and a classification network that are trained separately. We use a single shot multibox detector (SSD) as a proposal network to generate the set of pedestrian proposals. The proposal network is fine-tuned from a pre-trained network by several pedestrian data sets of large input size (512 × 512 pixels) in order to improve detection accuracy of small pedestrians. Then, we use a classification network to classify pedestrian proposals. We then combine the scores from the proposal network and the classification network to obtain better final detection scores. Experiments were evaluated using the Caltech test set, and, compared to other state-of-the-art methods of pedestrian detection task, the proposed method obtains better results for small pedestrians (30 to 50 pixels in height) with an average miss rate of 42%.
{"title":"A two-stage training deep neural network for small pedestrian detection","authors":"Tran Duy Linh, Masayuki Arai","doi":"10.1109/MLSP.2017.8168106","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168106","url":null,"abstract":"In the present paper, we propose a deep network architecture in order to improve the accuracy of pedestrian detection. The proposed method contains a proposal network and a classification network that are trained separately. We use a single shot multibox detector (SSD) as a proposal network to generate the set of pedestrian proposals. The proposal network is fine-tuned from a pre-trained network by several pedestrian data sets of large input size (512 × 512 pixels) in order to improve detection accuracy of small pedestrians. Then, we use a classification network to classify pedestrian proposals. We then combine the scores from the proposal network and the classification network to obtain better final detection scores. Experiments were evaluated using the Caltech test set, and, compared to other state-of-the-art methods of pedestrian detection task, the proposed method obtains better results for small pedestrians (30 to 50 pixels in height) with an average miss rate of 42%.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"44 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85674323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168157
Z. Gong, P. Zhong, Yang Yu, Jiaxin Shan, W. Hu
Due to the high spectral resolution and the similarity of some spectrums between different classes, hyperspectral image classification turns out to be an important but challenging task. Researches show the powerful ability of deep learning for hyperspectral image classification. However, the lack of training samples makes it difficult to extract discriminative features and achieve performance as expected. To solve the problem, a multi-scale CNN which can extract multi-scale features is designed for hyperspectral image classification. Furthermore, D-DSML, a diversified metric, is proposed to further improve the representational ability of deep methods. In this paper, a D-DSML-MSCNN method, which jointly learns deep multi-scale features and diversified metrics for hyperspectral image classification, is proposed to take both advantages of D-DSML and MSCNN. Experiments are conducted on Pavia University data to show the effectiveness of our method for hyperspectral image classification. The results show the advantage of our method when compared with other recent results.
{"title":"Joint learning of deep multi-scale features and diversified metrics for hyperspectral image classification","authors":"Z. Gong, P. Zhong, Yang Yu, Jiaxin Shan, W. Hu","doi":"10.1109/MLSP.2017.8168157","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168157","url":null,"abstract":"Due to the high spectral resolution and the similarity of some spectrums between different classes, hyperspectral image classification turns out to be an important but challenging task. Researches show the powerful ability of deep learning for hyperspectral image classification. However, the lack of training samples makes it difficult to extract discriminative features and achieve performance as expected. To solve the problem, a multi-scale CNN which can extract multi-scale features is designed for hyperspectral image classification. Furthermore, D-DSML, a diversified metric, is proposed to further improve the representational ability of deep methods. In this paper, a D-DSML-MSCNN method, which jointly learns deep multi-scale features and diversified metrics for hyperspectral image classification, is proposed to take both advantages of D-DSML and MSCNN. Experiments are conducted on Pavia University data to show the effectiveness of our method for hyperspectral image classification. The results show the advantage of our method when compared with other recent results.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"67 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80954721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168109
Laurens Bliek, M. Verhaegen, S. Wahls
We propose CDONE, a convex version of the DONE algorithm. DONE is a derivative-free online optimization algorithm that uses surrogate modeling with noisy measurements to find a minimum of objective functions that are expensive to evaluate. Inspired by their success in deep learning, CDONE makes use of rectified linear units, together with a nonnegativity constraint to enforce convexity of the surrogate model. This leads to a sparse and cheap to evaluate surrogate model of the unknown optimization objective that is still accurate and that can be minimized with convex optimization algorithms. The CDONE algorithm is demonstrated on a toy example and on the problem of hyper-parameter optimization for a deep learning example on handwritten digit classification.
{"title":"Online function minimization with convex random relu expansions","authors":"Laurens Bliek, M. Verhaegen, S. Wahls","doi":"10.1109/MLSP.2017.8168109","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168109","url":null,"abstract":"We propose CDONE, a convex version of the DONE algorithm. DONE is a derivative-free online optimization algorithm that uses surrogate modeling with noisy measurements to find a minimum of objective functions that are expensive to evaluate. Inspired by their success in deep learning, CDONE makes use of rectified linear units, together with a nonnegativity constraint to enforce convexity of the surrogate model. This leads to a sparse and cheap to evaluate surrogate model of the unknown optimization objective that is still accurate and that can be minimized with convex optimization algorithms. The CDONE algorithm is demonstrated on a toy example and on the problem of hyper-parameter optimization for a deep learning example on handwritten digit classification.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"17 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90240796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168118
Anshul Thakur, V. Abrol, Pulkit Sharma, Padmanabhan Rajan
In this paper we describe a semi-supervised algorithm to segment bird vocalizations using matrix factorization and Rényi entropy based mutual information. Singular value decomposition (SVD) is applied on pooled time-frequency representations of bird vocalizations to learn basis vectors. By utilizing only a few of the bases, a compact feature representation is obtained for input test data. Rényi entropy based mutual information is calculated between feature representations of consecutive frames. After some simple post-processing, a threshold is used to reliably distinguish bird vocalizations from other sounds. The algorithm is evaluated on the field recordings of different bird species and different SNR conditions. The results highlight the effectiveness of the proposed method in all SNR conditions, improvements over other methods, and its generality.
{"title":"Rényi entropy based mutual information for semi-supervised bird vocalization segmentation","authors":"Anshul Thakur, V. Abrol, Pulkit Sharma, Padmanabhan Rajan","doi":"10.1109/MLSP.2017.8168118","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168118","url":null,"abstract":"In this paper we describe a semi-supervised algorithm to segment bird vocalizations using matrix factorization and Rényi entropy based mutual information. Singular value decomposition (SVD) is applied on pooled time-frequency representations of bird vocalizations to learn basis vectors. By utilizing only a few of the bases, a compact feature representation is obtained for input test data. Rényi entropy based mutual information is calculated between feature representations of consecutive frames. After some simple post-processing, a threshold is used to reliably distinguish bird vocalizations from other sounds. The algorithm is evaluated on the field recordings of different bird species and different SNR conditions. The results highlight the effectiveness of the proposed method in all SNR conditions, improvements over other methods, and its generality.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"101 ","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91553228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}