We investigate the problem of adaptive nonlinear regression and introduce tree based piecewise linear regression algorithms that are highly efficient and provide significantly improved performance with guaranteed upper bounds in an individual sequence manner. We partition the regressor space using hyperplanes in a nested structure according to the notion of a tree. In this manner, we introduce an adaptive nonlinear regression algorithm that not only adapts the regressor of each partition but also learns the complete tree structure with a computational complexity only polynomial in the number of nodes of the tree. Our algorithm is constructed to directly minimize the final regression error without introducing any ad-hoc parameters. Moreover, our method can be readily incorporated with any tree construction method as demonstrated in the paper.
{"title":"Piecewise nonlinear regression via decision adaptive trees","authors":"N. D. Vanli, M. O. Sayin, S. Ergüt, S. Kozat","doi":"10.5281/ZENODO.44014","DOIUrl":"https://doi.org/10.5281/ZENODO.44014","url":null,"abstract":"We investigate the problem of adaptive nonlinear regression and introduce tree based piecewise linear regression algorithms that are highly efficient and provide significantly improved performance with guaranteed upper bounds in an individual sequence manner. We partition the regressor space using hyperplanes in a nested structure according to the notion of a tree. In this manner, we introduce an adaptive nonlinear regression algorithm that not only adapts the regressor of each partition but also learns the complete tree structure with a computational complexity only polynomial in the number of nodes of the tree. Our algorithm is constructed to directly minimize the final regression error without introducing any ad-hoc parameters. Moreover, our method can be readily incorporated with any tree construction method as demonstrated in the paper.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130501843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a novel distributed reduced-rank scheme and an adaptive algorithm for distributed estimation in wireless sensor networks. The proposed distributed scheme is based on a transformation that performs dimensionality reduction at each agent of the network followed by a reduced-dimension parameter vector. A distributed reduced-rank joint iterative estimation algorithm is developed, which has the ability to achieve significantly reduced communication overhead and improved performance when compared with existing techniques. Simulation results illustrate the advantages of the proposed strategy in terms of convergence rate and mean square error performance.
{"title":"Distributed reduced-rank estimation based on joint iterative optimization in sensor networks","authors":"Songcen Xu, R. D. Lamare, H. Poor","doi":"10.5281/ZENODO.43837","DOIUrl":"https://doi.org/10.5281/ZENODO.43837","url":null,"abstract":"This paper proposes a novel distributed reduced-rank scheme and an adaptive algorithm for distributed estimation in wireless sensor networks. The proposed distributed scheme is based on a transformation that performs dimensionality reduction at each agent of the network followed by a reduced-dimension parameter vector. A distributed reduced-rank joint iterative estimation algorithm is developed, which has the ability to achieve significantly reduced communication overhead and improved performance when compared with existing techniques. Simulation results illustrate the advantages of the proposed strategy in terms of convergence rate and mean square error performance.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the task of detecting diverse semantic concepts in videos. Within this context, the Bag Of Visual Words (BoW) model, inherited from sampled video keyframes analysis, is among the most popular methods. However, in the case of image sequences, this model faces new difficulties such as the added motion information, the extra computational cost and the increased variability of content and concepts to handle. Considering this spatio-temporal context, we propose to extend the BoW model by introducing video preprocessing strategies with the help of a retina model, before extracting BoW descriptors. This preprocessing increases the robustness of local features to disturbances such as noise and lighting variations. Additionally, the retina model is used to detect potentially salient areas and to construct spatio-temporal descriptors. We experiment with three state of the art local features, SIFT, SURF and FREAK, and we evaluate our results on the TRECVid 2012 Semantic Indexing (SIN) challenge.
{"title":"Retina enhanced bag of words descriptors for video classification","authors":"Sabin Tiberius Strat, A. Benoît, P. Lambert","doi":"10.5281/ZENODO.44198","DOIUrl":"https://doi.org/10.5281/ZENODO.44198","url":null,"abstract":"This paper addresses the task of detecting diverse semantic concepts in videos. Within this context, the Bag Of Visual Words (BoW) model, inherited from sampled video keyframes analysis, is among the most popular methods. However, in the case of image sequences, this model faces new difficulties such as the added motion information, the extra computational cost and the increased variability of content and concepts to handle. Considering this spatio-temporal context, we propose to extend the BoW model by introducing video preprocessing strategies with the help of a retina model, before extracting BoW descriptors. This preprocessing increases the robustness of local features to disturbances such as noise and lighting variations. Additionally, the retina model is used to detect potentially salient areas and to construct spatio-temporal descriptors. We experiment with three state of the art local features, SIFT, SURF and FREAK, and we evaluate our results on the TRECVid 2012 Semantic Indexing (SIN) challenge.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129672768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Psychoacoustic studies show that the strength of masking is, among others, dependent on the tonality of the masker: the effect of noise maskers is stronger than that of tone maskers. Recently, a Partial Spectral Flatness Measure (PSFM) was introduced for tonality estimation in a psychoacoustic model for perceptual audio coding. The model consists of an Infinite Impulse Response (IIR) filterbank which considers the spreading effect of individual local maskers in simultaneous masking. An optimized (with respect to audio quality and computational efficiency) PSFM is now compared to a similar psychoacoustic model with prediction based tonality estimation in medium (48 kbit/s) and low (32 kbit/s) bit rate conditions (mono) via subjective quality tests. 15 expert listeners participated in the subjective tests. The results are depicted and discussed. Additionally, we conducted the subjective tests with 15 non-expert consumers whose results are also shown and compared to those of the experts.
{"title":"A psychoacoustic model with Partial Spectral Flatness Measure for tonality estimation","authors":"Armin Taghipour, M. Jaikumar, B. Edler","doi":"10.5281/ZENODO.43815","DOIUrl":"https://doi.org/10.5281/ZENODO.43815","url":null,"abstract":"Psychoacoustic studies show that the strength of masking is, among others, dependent on the tonality of the masker: the effect of noise maskers is stronger than that of tone maskers. Recently, a Partial Spectral Flatness Measure (PSFM) was introduced for tonality estimation in a psychoacoustic model for perceptual audio coding. The model consists of an Infinite Impulse Response (IIR) filterbank which considers the spreading effect of individual local maskers in simultaneous masking. An optimized (with respect to audio quality and computational efficiency) PSFM is now compared to a similar psychoacoustic model with prediction based tonality estimation in medium (48 kbit/s) and low (32 kbit/s) bit rate conditions (mono) via subjective quality tests. 15 expert listeners participated in the subjective tests. The results are depicted and discussed. Additionally, we conducted the subjective tests with 15 non-expert consumers whose results are also shown and compared to those of the experts.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129995030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabiane Rediess, R. Conceição, B. Zatt, M. Porto, L. Agostini
This work presents a cost function optimization for the internal decision of the HEVC Sample Adaptive Offset (SAO) filter. The optimization approach is focused on an efficient hardware design implementation, and explores two critical points. The first one focus in the use of fixed-point data instead of float-point data, and the second focus on reduce the number of full multipliers and divisors. The simulations results show that those proposals do not present significant impact on BD-rate measurements. Based on both these two hardware-friendly optimizations, we propose a hardware design for this cost function module. The FPGA synthesis results show that the proposed architecture achieved 521 MHz, and are able to process UHD 8K@120 fps operating at 47 MHz.
{"title":"Cost function optimization and its hardware design for the Sample Adaptive Offset of HEVC standard","authors":"Fabiane Rediess, R. Conceição, B. Zatt, M. Porto, L. Agostini","doi":"10.5281/ZENODO.44158","DOIUrl":"https://doi.org/10.5281/ZENODO.44158","url":null,"abstract":"This work presents a cost function optimization for the internal decision of the HEVC Sample Adaptive Offset (SAO) filter. The optimization approach is focused on an efficient hardware design implementation, and explores two critical points. The first one focus in the use of fixed-point data instead of float-point data, and the second focus on reduce the number of full multipliers and divisors. The simulations results show that those proposals do not present significant impact on BD-rate measurements. Based on both these two hardware-friendly optimizations, we propose a hardware design for this cost function module. The FPGA synthesis results show that the proposed architecture achieved 521 MHz, and are able to process UHD 8K@120 fps operating at 47 MHz.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115101363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Localization of audio sources using microphone arrays has been an important research problem for more than two decades. Many traditional methods for solving the problem are based on a two-stage procedure: first, information about the audio source, such as time differences-of-arrival (TDOAs) and gain ratios-of-arrival (GROAs) between microphones is estimated, and, second, this knowledge is used to localize the audio source. These methods often have a low computational complexity, but this comes at the cost of a limited estimation accuracy. Therefore, we propose a new localization approach, where the desired signal is modeled using TDOAs and GROAs, which are determined by the source location. This facilitates the derivation of one-stage, maximum likelihood methods under a white Gaussian noise assumption that is applicable in both near- and far-field scenarios. Simulations show that the proposed method is statistically efficient and outperforms state-of-the-art estimators in most scenarios, involving both synthetic and real data.
{"title":"Near-field localization of audio: A maximum likelihood approach","authors":"J. Jensen, M. G. Christensen","doi":"10.5281/ZENODO.43840","DOIUrl":"https://doi.org/10.5281/ZENODO.43840","url":null,"abstract":"Localization of audio sources using microphone arrays has been an important research problem for more than two decades. Many traditional methods for solving the problem are based on a two-stage procedure: first, information about the audio source, such as time differences-of-arrival (TDOAs) and gain ratios-of-arrival (GROAs) between microphones is estimated, and, second, this knowledge is used to localize the audio source. These methods often have a low computational complexity, but this comes at the cost of a limited estimation accuracy. Therefore, we propose a new localization approach, where the desired signal is modeled using TDOAs and GROAs, which are determined by the source location. This facilitates the derivation of one-stage, maximum likelihood methods under a white Gaussian noise assumption that is applicable in both near- and far-field scenarios. Simulations show that the proposed method is statistically efficient and outperforms state-of-the-art estimators in most scenarios, involving both synthetic and real data.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131060447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sensor networks are commonly deployed to measure data from the environment and accurately estimate certain parameters. However, the number of deployed sensors is often limited by several constraints, such as their cost. Therefore, their locations must be opportunely optimized to enhance the estimation of the parameters. In a previous work, we considered a low-dimensional linear model for the measured data and proposed a near-optimal algorithm to optimize the sensor placement. In this paper, we propose to model the data as a union of subspaces to further reduce the amount of sensors without degrading the quality of the estimation. Moreover, we introduce a greedy algorithm for the sensor placement for such a model and show the near-optimality of its solution. Finally, we verify with numerical experiments the advantage of the proposed model in reducing the number of sensors while maintaining intact the estimation performance.
{"title":"Near-optimal sensor placement for signals lying in a union of subspaces","authors":"Dalia El Badawy, J. Ranieri, M. Vetterli","doi":"10.5281/ZENODO.44165","DOIUrl":"https://doi.org/10.5281/ZENODO.44165","url":null,"abstract":"Sensor networks are commonly deployed to measure data from the environment and accurately estimate certain parameters. However, the number of deployed sensors is often limited by several constraints, such as their cost. Therefore, their locations must be opportunely optimized to enhance the estimation of the parameters. In a previous work, we considered a low-dimensional linear model for the measured data and proposed a near-optimal algorithm to optimize the sensor placement. In this paper, we propose to model the data as a union of subspaces to further reduce the amount of sensors without degrading the quality of the estimation. Moreover, we introduce a greedy algorithm for the sensor placement for such a model and show the near-optimality of its solution. Finally, we verify with numerical experiments the advantage of the proposed model in reducing the number of sensors while maintaining intact the estimation performance.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work we address the problem of simulating the spatial and timbral cues of a given sound event, or auditory scene, using an array of loudspeakers. We first define the problem with a general numerical framework that encompasses many known techniques from physical acoustics, crosstalk cancellation, and acoustic control. In contrast to many previous approaches, the system described in this work is inherently broadband as it jointly designs a set of spatio-temporal filters while allowing for constraints in other domains. With this framework we show similarities and differences between known techniques and suggest some new, unexplored methods. In particular, we focus on perceptually motivated choices for the cost function and regularization. These methods are then compared by implementing the systems on a linear array of loudspeakers and evaluating the timbral and spatial qualities of the system using objective metrics.
{"title":"A unified approach to numerical auditory scene synthesis using loudspeaker arrays","authors":"Joshua Atkins, Ismael Nawfal, D. Giacobello","doi":"10.5281/ZENODO.44186","DOIUrl":"https://doi.org/10.5281/ZENODO.44186","url":null,"abstract":"In this work we address the problem of simulating the spatial and timbral cues of a given sound event, or auditory scene, using an array of loudspeakers. We first define the problem with a general numerical framework that encompasses many known techniques from physical acoustics, crosstalk cancellation, and acoustic control. In contrast to many previous approaches, the system described in this work is inherently broadband as it jointly designs a set of spatio-temporal filters while allowing for constraints in other domains. With this framework we show similarities and differences between known techniques and suggest some new, unexplored methods. In particular, we focus on perceptually motivated choices for the cost function and regularization. These methods are then compared by implementing the systems on a linear array of loudspeakers and evaluating the timbral and spatial qualities of the system using objective metrics.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132156633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lloyd-Max quantization (LMQ) is a widely used scalar non-uniform quantization approach targeting for the minimum mean squared error (MMSE). Once designed, the quantizer codebook is fixed over time and does not take advantage of possible correlations in the input signals. Exploiting correlation in scalar quantization could be achieved by predictive quantization, however, for the price of a higher bit error sensitivity. In order to improve the Lloyd-Max quantizer performance for correlated processes without encoder-sided prediction, a novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper. Based on previously received samples, the current sample can be predicted a priori. Thereafter, a quantization codebook adapted over time will be generated according to the prediction error probability density function. Compared to the standard LMQ, distinct improvement is achieved with our receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-decision decoding.
{"title":"Improving scalar quantization for correlated processes using adaptive codebooks only at the receiver","authors":"Sai Han, T. Fingscheidt","doi":"10.5281/ZENODO.43845","DOIUrl":"https://doi.org/10.5281/ZENODO.43845","url":null,"abstract":"Lloyd-Max quantization (LMQ) is a widely used scalar non-uniform quantization approach targeting for the minimum mean squared error (MMSE). Once designed, the quantizer codebook is fixed over time and does not take advantage of possible correlations in the input signals. Exploiting correlation in scalar quantization could be achieved by predictive quantization, however, for the price of a higher bit error sensitivity. In order to improve the Lloyd-Max quantizer performance for correlated processes without encoder-sided prediction, a novel scalar decoding approach utilizing the correlation of input signals is proposed in this paper. Based on previously received samples, the current sample can be predicted a priori. Thereafter, a quantization codebook adapted over time will be generated according to the prediction error probability density function. Compared to the standard LMQ, distinct improvement is achieved with our receiver in error-free and error-prone transmission conditions, both with hard-decision and soft-decision decoding.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132178729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper a novel method is introduced for propagating person identity labels on facial images in an iterative manner. The proposed method takes into account information about the data structure, obtained through clustering. This information is exploited in two ways: to regulate the similarity strength between the data and to indicate which samples should be selected for label propagation initialization. The proposed method can also find application in label propagation on multiple graphs. The performance of the proposed Iterative Label Propagation (ILP) method was evaluated on facial images extracted from stereo movies. Experimental results showed that the proposed method outperforms state of the art methods either when only one or both video channels are used for label propagation.
{"title":"Iterative Label Propagation on facial images","authors":"O. Zoidi, A. Tefas, N. Nikolaidis, I. Pitas","doi":"10.5281/ZENODO.44196","DOIUrl":"https://doi.org/10.5281/ZENODO.44196","url":null,"abstract":"In this paper a novel method is introduced for propagating person identity labels on facial images in an iterative manner. The proposed method takes into account information about the data structure, obtained through clustering. This information is exploited in two ways: to regulate the similarity strength between the data and to indicate which samples should be selected for label propagation initialization. The proposed method can also find application in label propagation on multiple graphs. The performance of the proposed Iterative Label Propagation (ILP) method was evaluated on facial images extracted from stereo movies. Experimental results showed that the proposed method outperforms state of the art methods either when only one or both video channels are used for label propagation.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132798211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}