Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853980
Brian McFee, D. Ellis
Onset detection forms the critical first stage of most beat tracking algorithms. While common spectral-difference onset detectors can work well in genres with clear rhythmic structure, they can be sensitive to loud, asynchronous events (e.g., off-beat notes in a jazz solo), which limits their general efficacy. In this paper, we investigate methods to improve the robustness of onset detection for beat tracking. Experimental results indicate that simple modifications to onset detection can produce large improvements in beat tracking accuracy.
{"title":"Better beat tracking through robust onset aggregation","authors":"Brian McFee, D. Ellis","doi":"10.1109/ICASSP.2014.6853980","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853980","url":null,"abstract":"Onset detection forms the critical first stage of most beat tracking algorithms. While common spectral-difference onset detectors can work well in genres with clear rhythmic structure, they can be sensitive to loud, asynchronous events (e.g., off-beat notes in a jazz solo), which limits their general efficacy. In this paper, we investigate methods to improve the robustness of onset detection for beat tracking. Experimental results indicate that simple modifications to onset detection can produce large improvements in beat tracking accuracy.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"41 1","pages":"2154-2158"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81518052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6855001
K. Wagner, M. Doroslovački
Diffusion strategies for learning across networks which minimize the transient regime mean-square deviation across all nodes are presented. The problem of choosing combination coefficients which minimize the mean-square deviation at all given time instances results in a quadratic program with linear constraints. The implementation of the optimal procedure is based on the estimation of weight deviation vectors for which an algorithm is proposed. Additionally, the optimization that uses relaxed constraints is considered. The proposed methods were validated through simulations for different estimation distribution strategies and input signals. The results show a potential for significant improvement of the convergence speed.
{"title":"Combination coefficients for fastest convergence of distributed LMS estimation","authors":"K. Wagner, M. Doroslovački","doi":"10.1109/ICASSP.2014.6855001","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855001","url":null,"abstract":"Diffusion strategies for learning across networks which minimize the transient regime mean-square deviation across all nodes are presented. The problem of choosing combination coefficients which minimize the mean-square deviation at all given time instances results in a quadratic program with linear constraints. The implementation of the optimal procedure is based on the estimation of weight deviation vectors for which an algorithm is proposed. Additionally, the optimization that uses relaxed constraints is considered. The proposed methods were validated through simulations for different estimation distribution strategies and input signals. The results show a potential for significant improvement of the convergence speed.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"7218-7222"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81555982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854337
J. Dohl, G. Fettweis
Nonlinear distortions in analog frontends are becoming a growing problem which is not limited to power amplifiers. Modern modulation methods such as OFDM and next generation standards have high linearity requirements on all components in the signal path. A radio system that can tolerate a certain degree of nonlinear distortion without substantial loss of performance could enable high cost savings in development and production. In this paper we present a novel iterative blind estimator for nonlinear distortions. It complements existing mitigation algorithms by providing them with accurate estimates of the nonlinearity characteristic. It is shown that there is a negligible performance gap between perfect and estimated knowledge. The method is designed to be computationally inexpensive and can be readily implemented on today's digital signal processing systems.
{"title":"Iterative blind estimation of nonlinear channels","authors":"J. Dohl, G. Fettweis","doi":"10.1109/ICASSP.2014.6854337","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854337","url":null,"abstract":"Nonlinear distortions in analog frontends are becoming a growing problem which is not limited to power amplifiers. Modern modulation methods such as OFDM and next generation standards have high linearity requirements on all components in the signal path. A radio system that can tolerate a certain degree of nonlinear distortion without substantial loss of performance could enable high cost savings in development and production. In this paper we present a novel iterative blind estimator for nonlinear distortions. It complements existing mitigation algorithms by providing them with accurate estimates of the nonlinearity characteristic. It is shown that there is a negligible performance gap between perfect and estimated knowledge. The method is designed to be computationally inexpensive and can be readily implemented on today's digital signal processing systems.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"3923-3927"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81635821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6855164
S. Kundu, Panos P. Markopoulos, D. Pados
Recently, Markopoulos et al. [1], [2] presented an optimal algorithm that computes the L1 maximum-projection principal component of any set of N real-valued data vectors of dimension D with complexity polynomial in N, O(ND). Still, moderate to high values of the data dimension D and/or data record size N may render the optimal algorithm unsuitable for practical implementation due to its exponential in D complexity. In this paper, we present for the first time in the literature a fast greedy single-bit-flipping conditionally optimal iterative algorithm for the computation of the L1 principal component with complexity O(N3). Detailed numerical studies are carried out demonstrating the effectiveness of the developed algorithm with applications to the general field of data dimensionality reduction and direction-of-arrival estimation.
{"title":"Fast computation of the L1-principal component of real-valued data","authors":"S. Kundu, Panos P. Markopoulos, D. Pados","doi":"10.1109/ICASSP.2014.6855164","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855164","url":null,"abstract":"Recently, Markopoulos et al. [1], [2] presented an optimal algorithm that computes the L1 maximum-projection principal component of any set of N real-valued data vectors of dimension D with complexity polynomial in N, O(ND). Still, moderate to high values of the data dimension D and/or data record size N may render the optimal algorithm unsuitable for practical implementation due to its exponential in D complexity. In this paper, we present for the first time in the literature a fast greedy single-bit-flipping conditionally optimal iterative algorithm for the computation of the L1 principal component with complexity O(N3). Detailed numerical studies are carried out demonstrating the effectiveness of the developed algorithm with applications to the general field of data dimensionality reduction and direction-of-arrival estimation.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"8028-8032"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81845541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854298
Heysem Kaya, F. Eyben, A. A. Salah, Björn Schuller
In this study we make use of Canonical Correlation Analysis (CCA) based feature selection for continuous depression recognition from speech. Besides its common use in multi-modal/multi-view feature extraction, CCA can be easily employed as a feature selector. We introduce several novel ways of CCA based filter (ranking) methods, showing their relations to previous work. We test the suitability of proposed methods on the AVEC 2013 dataset under the ACM MM 2013 Challenge protocol. Using 17% of features, we obtained a relative improvement of 30% on the challenge's test-set baseline Root Mean Square Error.
在这项研究中,我们利用典型相关分析(CCA)为基础的特征选择,从语音连续抑郁症识别。除了通常用于多模态/多视图特征提取之外,CCA还可以很容易地用作特征选择器。我们介绍了几种基于CCA的过滤(排序)方法的新方法,并说明了它们与以往工作的关系。在ACM MM 2013挑战协议下,我们在AVEC 2013数据集上测试了所提出方法的适用性。使用17%的特征,我们在挑战的测试集基线均方根误差上获得了30%的相对改进。
{"title":"CCA based feature selection with application to continuous depression recognition from acoustic speech features","authors":"Heysem Kaya, F. Eyben, A. A. Salah, Björn Schuller","doi":"10.1109/ICASSP.2014.6854298","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854298","url":null,"abstract":"In this study we make use of Canonical Correlation Analysis (CCA) based feature selection for continuous depression recognition from speech. Besides its common use in multi-modal/multi-view feature extraction, CCA can be easily employed as a feature selector. We introduce several novel ways of CCA based filter (ranking) methods, showing their relations to previous work. We test the suitability of proposed methods on the AVEC 2013 dataset under the ACM MM 2013 Challenge protocol. Using 17% of features, we obtained a relative improvement of 30% on the challenge's test-set baseline Root Mean Square Error.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"3729-3733"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81859052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854790
Yohan Lejosne, D. Slock, Y. Yuan-Wu
Ergodic interference alignment (IA) is a simple yet powerful tool that not only achieves the optimal K/2 degrees of freedom (DoF) of the K-user single-input single-output (SISO) interference channel (IC), but also allows each user to achieve at least half of its interference-free capacity at any SNR. By considering more general message sets, Nazer et al. also covered the MISO case. In this paper, we consider first the SIMO interference channel and extend ergodic IA techniques to this setting with Nr receive antennas. Our scheme achieves KNr/(Nr + 1), which is the DoF yielded by (standard) IA and is also the DoF of the channel when K > Nr. Moreover, this technique exhibits spatial scale invariance. By combining the existing MISO and the new SIMO results, we can also cover MIMO with Nt transmit antennas for the cases where either Nt/Nr or Nr/Nt is an integer R, yielding DoF =3D min(Nt, Nr)KR/(R + 1) which is optimal for K > R.
{"title":"Ergodic interference alignment for the SIMO/MIMO interference channel","authors":"Yohan Lejosne, D. Slock, Y. Yuan-Wu","doi":"10.1109/ICASSP.2014.6854790","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854790","url":null,"abstract":"Ergodic interference alignment (IA) is a simple yet powerful tool that not only achieves the optimal K/2 degrees of freedom (DoF) of the K-user single-input single-output (SISO) interference channel (IC), but also allows each user to achieve at least half of its interference-free capacity at any SNR. By considering more general message sets, Nazer et al. also covered the MISO case. In this paper, we consider first the SIMO interference channel and extend ergodic IA techniques to this setting with N<sub>r</sub> receive antennas. Our scheme achieves KN<sub>r</sub>/(N<sub>r</sub> + 1), which is the DoF yielded by (standard) IA and is also the DoF of the channel when K > N<sub>r</sub>. Moreover, this technique exhibits spatial scale invariance. By combining the existing MISO and the new SIMO results, we can also cover MIMO with N<sub>t</sub> transmit antennas for the cases where either N<sub>t</sub>/N<sub>r</sub> or N<sub>r</sub>/N<sub>t</sub> is an integer R, yielding DoF =3D min(N<sub>t</sub>, N<sub>r</sub>)KR/(R + 1) which is optimal for K > R.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"6172-6175"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81861347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854762
Zeyu You, R. Raich, Yonghong Huang
We present an inference framework for automatic detection of activations of home appliances based on voltage envelope waveforms. We cast the problem of appliance detection and recognition as an inference problem. When the activation signatures are known, the problem reduces to a simple detection problem. When the activation signatures are unknown, the problem is reformulated as a blind joint delay estimation. Due to the non-convexity of the negative log-likelihood, finding a global optimal solution is a key challenge. Here, we introduce a novel algorithm to estimate the activation templates, which is guaranteed to yield an error within a factor of two of that of the optimal solution. We apply our method to a real-world dataset consisting of voltage waveform measurements of several appliances obtained in multiple homes over a few weeks. Based on ground truth data, we present a quantitative analysis of the proposed algorithm and alternative approaches.
{"title":"An inference framework for detection of home appliance activation from voltage measurements","authors":"Zeyu You, R. Raich, Yonghong Huang","doi":"10.1109/ICASSP.2014.6854762","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854762","url":null,"abstract":"We present an inference framework for automatic detection of activations of home appliances based on voltage envelope waveforms. We cast the problem of appliance detection and recognition as an inference problem. When the activation signatures are known, the problem reduces to a simple detection problem. When the activation signatures are unknown, the problem is reformulated as a blind joint delay estimation. Due to the non-convexity of the negative log-likelihood, finding a global optimal solution is a key challenge. Here, we introduce a novel algorithm to estimate the activation templates, which is guaranteed to yield an error within a factor of two of that of the optimal solution. We apply our method to a real-world dataset consisting of voltage waveform measurements of several appliances obtained in multiple homes over a few weeks. Based on ground truth data, we present a quantitative analysis of the proposed algorithm and alternative approaches.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"6033-6037"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84284901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854595
Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto
This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.
{"title":"Vocal timbre analysis using latent Dirichlet allocation and cross-gender vocal timbre similarity","authors":"Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto","doi":"10.1109/ICASSP.2014.6854595","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854595","url":null,"abstract":"This paper presents a vocal timbre analysis method based on topic modeling using latent Dirichlet allocation (LDA). Although many works have focused on analyzing characteristics of singing voices, none have dealt with “latent” characteristics (topics) of vocal timbre, which are shared by multiple singing voices. In the work described in this paper, we first automatically extracted vocal timbre features from polyphonic musical audio signals including vocal sounds. The extracted features were used as observed data, and mixing weights of multiple topics were estimated by LDA. Finally, the semantics of each topic were visualized by using a word-cloud-based approach. Experimental results for a singer identification task using 36 songs sung by 12 singers showed that our method achieved a mean reciprocal rank of 0.86. We also proposed a method for estimating cross-gender vocal timbre similarity by generating pitch-shifted (frequency-warped) signals of every singing voice. Experimental results for a cross-gender singer retrieval task showed that our method discovered interesting similar pitch-shifted singers.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"5202-5206"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854447
P. Bilinski, J. Ahrens, Mark R. P. Thomas, I. Tashev, John C. Platt
We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject's anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subject's anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.
{"title":"HRTF magnitude synthesis via sparse representation of anthropometric features","authors":"P. Bilinski, J. Ahrens, Mark R. P. Thomas, I. Tashev, John C. Platt","doi":"10.1109/ICASSP.2014.6854447","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854447","url":null,"abstract":"We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject's anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subject's anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"4468-4472"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84367066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854312
S. Voloshynovskiy, M. Diephuis, T. Holotyak
Many state-of-the-art methods in image retrieval, classification and copy detection are based on the Bag-of-Features (BOF) framework. However, the performance of these systems is mostly experimentally evaluated and little results are reported on theoretical performance. In this paper, we present a statistical framework that makes it possible to analyse the performance of a simple BOF-system and to better understand the impact of different design elements such as the robustness of descriptors, the accuracy of encoding/assignment, information preserving pooling and finally decision making. The proposed framework can be also of interest for a security and privacy analysis of BOF systems.
{"title":"Performance analysis of Bag-of-Features based content identification systems","authors":"S. Voloshynovskiy, M. Diephuis, T. Holotyak","doi":"10.1109/ICASSP.2014.6854312","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854312","url":null,"abstract":"Many state-of-the-art methods in image retrieval, classification and copy detection are based on the Bag-of-Features (BOF) framework. However, the performance of these systems is mostly experimentally evaluated and little results are reported on theoretical performance. In this paper, we present a statistical framework that makes it possible to analyse the performance of a simple BOF-system and to better understand the impact of different design elements such as the robustness of descriptors, the accuracy of encoding/assignment, information preserving pooling and finally decision making. The proposed framework can be also of interest for a security and privacy analysis of BOF systems.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"3799-3803"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84468283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}