Pub Date : 2017-03-14DOI: 10.1109/ICASSP.2017.7952409
Thierry Dumas, A. Roumy, C. Guillemot
This paper addresses the problem of image compression using sparse representations. We propose a variant of autoencoder called Stochastic Winner-Take-All Auto-Encoder (SWTA AE). “Winner-Take-All” means that image patches compete with one another when computing their sparse representation and “Stochastic” indicates that a stochastic hyperparameter rules this competition during training. Unlike auto-encoders, SWTA AE performs variable rate image compression for images of any size after a single training, which is fundamental for compression. For comparison, we also propose a variant of Orthogonal Matching Pursuit (OMP) called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP). In terms of rate-distortion trade-off, SWTA AE outperforms auto-encoders but it is worse than WTA OMP. Besides, SWTA AE can compete with JPEG in terms of rate-distortion.
{"title":"Image compression with Stochastic Winner-Take-All Auto-Encoder","authors":"Thierry Dumas, A. Roumy, C. Guillemot","doi":"10.1109/ICASSP.2017.7952409","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952409","url":null,"abstract":"This paper addresses the problem of image compression using sparse representations. We propose a variant of autoencoder called Stochastic Winner-Take-All Auto-Encoder (SWTA AE). “Winner-Take-All” means that image patches compete with one another when computing their sparse representation and “Stochastic” indicates that a stochastic hyperparameter rules this competition during training. Unlike auto-encoders, SWTA AE performs variable rate image compression for images of any size after a single training, which is fundamental for compression. For comparison, we also propose a variant of Orthogonal Matching Pursuit (OMP) called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP). In terms of rate-distortion trade-off, SWTA AE outperforms auto-encoders but it is worse than WTA OMP. Besides, SWTA AE can compete with JPEG in terms of rate-distortion.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116828045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-14DOI: 10.1109/ICASSP.2017.7952955
Daan H. M. Schellekens, Thomas W. Sherson, R. Heusdens
Large-scale networks of computing units, often characterised by the absence of central control, have become commonplace in many applications. To facilitate data processing in these large-scale networks, distributed signal processing is required. The iterative behaviour of distributed processing algorithms combined with energy, computational power, and bandwidth limitations imposed by such networks, place tight constraints on the transmission capacities of the individual nodes. In this paper we investigate the effects of subtractive dithered uniform quantisation in PDMM for the synchronous distributed averaging problem. This is done by deriving expressions for the mean squared error (MSE) that include quantisation noise. Also, the required data rate for quantised PDMM is considered. It was found that for practical applications quantisation in PDMM can be applied with a fixed-rate quantiser, such that significant data rate reduction can be achieved, without compromising the rate of convergence.
{"title":"Quantisation effects in PDMM: A first study for synchronous distributed averaging","authors":"Daan H. M. Schellekens, Thomas W. Sherson, R. Heusdens","doi":"10.1109/ICASSP.2017.7952955","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952955","url":null,"abstract":"Large-scale networks of computing units, often characterised by the absence of central control, have become commonplace in many applications. To facilitate data processing in these large-scale networks, distributed signal processing is required. The iterative behaviour of distributed processing algorithms combined with energy, computational power, and bandwidth limitations imposed by such networks, place tight constraints on the transmission capacities of the individual nodes. In this paper we investigate the effects of subtractive dithered uniform quantisation in PDMM for the synchronous distributed averaging problem. This is done by deriving expressions for the mean squared error (MSE) that include quantisation noise. Also, the required data rate for quantised PDMM is considered. It was found that for practical applications quantisation in PDMM can be applied with a fixed-rate quantiser, such that significant data rate reduction can be achieved, without compromising the rate of convergence.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129181305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-13DOI: 10.1109/ICASSP.2017.7952470
Insu Jeon, Deokyoung Kang, S. Yoo
In this paper, we solve blind image deconvolution problem that is to remove blurs form a signal degraded image without any knowledge of the blur kernel. Since the problem is ill-posed, an image prior plays a significant role in accurate blind deconvolution. Traditional image prior assumes coefficients in filtered domains are sparse. However, it is assumed here that there exist additional structures over the sparse coefficients. Accordingly, we propose new problem formulation for the blind image deconvolution, which utilize the structural information by coupling Student's-t image prior with overlapping group sparsity. The proposed method resulted in an effective blind deconvolution algorithm that outperforms other state-of-the-art algorithms.
{"title":"Blind image deconvolution using Student's-t prior with overlapping group sparsity","authors":"Insu Jeon, Deokyoung Kang, S. Yoo","doi":"10.1109/ICASSP.2017.7952470","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952470","url":null,"abstract":"In this paper, we solve blind image deconvolution problem that is to remove blurs form a signal degraded image without any knowledge of the blur kernel. Since the problem is ill-posed, an image prior plays a significant role in accurate blind deconvolution. Traditional image prior assumes coefficients in filtered domains are sparse. However, it is assumed here that there exist additional structures over the sparse coefficients. Accordingly, we propose new problem formulation for the blind image deconvolution, which utilize the structural information by coupling Student's-t image prior with overlapping group sparsity. The proposed method resulted in an effective blind deconvolution algorithm that outperforms other state-of-the-art algorithms.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127683674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-13DOI: 10.1109/ICASSP.2017.7952680
Shan Wu, Shangfei Wang, Zhen Gao
The inherent dependencies among video content, personal characteristics, and perceptual emotion are crucial for personalized video emotion tagging, but have not been thoroughly exploited. To address this, we propose a novel topic model to capture such inherent dependencies. We assume that there are several potential human factors, or “topics,” that affect the personal characteristics and the personalized emotion responses to videos. During training, the proposed topic model exploits the latent space to model the relationships among personal characteristics, video content and video tagging behaviors. After learning, the proposed model can generate meaningful latent topics, which help personalized video emotion tagging. Efficient learning and inference algorithms of the model are proposed. Experimental results on the CP-QAE-I database demonstrate the effectiveness of the proposed approach in modeling complex relationships among video content, personal characteristics, and perceptual emotion, as well as its good performance in personalized video emotion.
{"title":"Personalized video emotion tagging through a topic model","authors":"Shan Wu, Shangfei Wang, Zhen Gao","doi":"10.1109/ICASSP.2017.7952680","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952680","url":null,"abstract":"The inherent dependencies among video content, personal characteristics, and perceptual emotion are crucial for personalized video emotion tagging, but have not been thoroughly exploited. To address this, we propose a novel topic model to capture such inherent dependencies. We assume that there are several potential human factors, or “topics,” that affect the personal characteristics and the personalized emotion responses to videos. During training, the proposed topic model exploits the latent space to model the relationships among personal characteristics, video content and video tagging behaviors. After learning, the proposed model can generate meaningful latent topics, which help personalized video emotion tagging. Efficient learning and inference algorithms of the model are proposed. Experimental results on the CP-QAE-I database demonstrate the effectiveness of the proposed approach in modeling complex relationships among video content, personal characteristics, and perceptual emotion, as well as its good performance in personalized video emotion.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114466603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-13DOI: 10.1109/ICASSP.2017.7953314
Trond F. Bergh, Ines Hafizovic, S. Holm
We have devised a greedy method for finding solutions to the sparse Deconvolution Approach for the Mapping of Acoustic Sources inverse problem using a variant of Orthogonal Matching Pursuit. The algorithm has two stages, wherein the first stage consists of selecting a subset of the basis vectors iteratively via a regularized inverse of the point spread function, and the second stage consists of constructing point source solutions using this basis subset and its coefficients via hierarchical agglomerative clustering. We have evaluated the algorithm on both synthetic and real data, and show that the overall accuracy in terms of direction of arrival and reconstructed source power is better than four other state of the art methods.
{"title":"Acoustic imaging of sparse Sources with Orthogonal Matching Pursuit and clustering of basis vectors","authors":"Trond F. Bergh, Ines Hafizovic, S. Holm","doi":"10.1109/ICASSP.2017.7953314","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953314","url":null,"abstract":"We have devised a greedy method for finding solutions to the sparse Deconvolution Approach for the Mapping of Acoustic Sources inverse problem using a variant of Orthogonal Matching Pursuit. The algorithm has two stages, wherein the first stage consists of selecting a subset of the basis vectors iteratively via a regularized inverse of the point spread function, and the second stage consists of constructing point source solutions using this basis subset and its coefficients via hierarchical agglomerative clustering. We have evaluated the algorithm on both synthetic and real data, and show that the overall accuracy in terms of direction of arrival and reconstructed source power is better than four other state of the art methods.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125059932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-13DOI: 10.1109/ICASSP.2017.7952973
Feng Xi, Shengyao Chen, Zhong Liu
This paper studies the estimation of the delay and Doppler parameters of the sub-Nyquist radars. By formulating the delay-Doppler estimation as the low-rank matrix recovery, we propose an atomic norm minimization-based estimation approach. With the recovered low-rank matrix, we determine and pair the delay and Doppler parameters of the radar targets. Numerical simulations demonstrate the superior performance of the proposed approach, as compared to the state-of-the-art approaches.
{"title":"Super-resolution delay-Doppler estimation for sub-Nyquist radar via atomic norm minimization","authors":"Feng Xi, Shengyao Chen, Zhong Liu","doi":"10.1109/ICASSP.2017.7952973","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952973","url":null,"abstract":"This paper studies the estimation of the delay and Doppler parameters of the sub-Nyquist radars. By formulating the delay-Doppler estimation as the low-rank matrix recovery, we propose an atomic norm minimization-based estimation approach. With the recovered low-rank matrix, we determine and pair the delay and Doppler parameters of the radar targets. Numerical simulations demonstrate the superior performance of the proposed approach, as compared to the state-of-the-art approaches.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131291898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-13DOI: 10.1109/ICASSP.2017.7952681
Yangyang Shu, Shangfei Wang
The inherent dependencies among multiple physiological signals are crucial for multimodal emotion recognition, but have not been thoroughly exploited yet. This paper propose to use restricted Boltzmann machine (RBM) to model such dependencies.Specifically, the visible nodes of RBM represent EEG and peripheral physiological signals, and thus the connections between visible nodes and hidden nodes capture the intrinsic relations among multiple physiological signals. The RBM generates new representation from multiple physiological signals. Then, a support vector machine is adopted to recognize users' emotion states from the generated features. Furthermore, we extend the proposed fusion method for incomplete datas, since physiological signals are often corrupted due to artifacts. Specifically, we pre-train the RBM using all the complete data, then we update missing values and RBM parameters to minimize free energy of visible vectors using both complete and incomplete data. Experiments on two benchmark databases demonstrate the effectiveness of the proposed methods.
{"title":"Emotion recognition through integrating EEG and peripheral signals","authors":"Yangyang Shu, Shangfei Wang","doi":"10.1109/ICASSP.2017.7952681","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952681","url":null,"abstract":"The inherent dependencies among multiple physiological signals are crucial for multimodal emotion recognition, but have not been thoroughly exploited yet. This paper propose to use restricted Boltzmann machine (RBM) to model such dependencies.Specifically, the visible nodes of RBM represent EEG and peripheral physiological signals, and thus the connections between visible nodes and hidden nodes capture the intrinsic relations among multiple physiological signals. The RBM generates new representation from multiple physiological signals. Then, a support vector machine is adopted to recognize users' emotion states from the generated features. Furthermore, we extend the proposed fusion method for incomplete datas, since physiological signals are often corrupted due to artifacts. Specifically, we pre-train the RBM using all the complete data, then we update missing values and RBM parameters to minimize free energy of visible vectors using both complete and incomplete data. Experiments on two benchmark databases demonstrate the effectiveness of the proposed methods.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121914493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-12DOI: 10.1109/ICASSP.2017.7952197
Jacob Donley, C. Ritz, W. Kleijn
In this paper, we investigate the effects of compensating for wave-domain filtering delay in an active speech control system. An active control system utilising wave-domain processed basis functions is evaluated for a linear array of dipole secondary sources. The target control soundfield is matched in a least squares sense using orthogonal wavefields to a predicted future target soundfield. Filtering is implemented using a block-based short-time signal processing approach which induces an inherent delay. We present an autoregressive method for predictively compensating for the filter delay. An approach to block-length choice that maximises the soundfield control is proposed for a trade-off between soundfield reproduction accuracy and prediction accuracy. Results show that block-length choice has a significant effect on the active suppression of speech.
{"title":"Active speech control using wave-domain processing with a linear wall of dipole secondary sources","authors":"Jacob Donley, C. Ritz, W. Kleijn","doi":"10.1109/ICASSP.2017.7952197","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952197","url":null,"abstract":"In this paper, we investigate the effects of compensating for wave-domain filtering delay in an active speech control system. An active control system utilising wave-domain processed basis functions is evaluated for a linear array of dipole secondary sources. The target control soundfield is matched in a least squares sense using orthogonal wavefields to a predicted future target soundfield. Filtering is implemented using a block-based short-time signal processing approach which induces an inherent delay. We present an autoregressive method for predictively compensating for the filter delay. An approach to block-length choice that maximises the soundfield control is proposed for a trade-off between soundfield reproduction accuracy and prediction accuracy. Results show that block-length choice has a significant effect on the active suppression of speech.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114355611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-12DOI: 10.1109/ICASSP.2017.7952757
S. Li, Dehui Yang, M. Wakin
Identifying characteristic vibrational modes and frequencies is of great importance for monitoring the health of structures such as buildings and bridges. In this work, we address the problem of estimating the modal parameters of a structure from small amounts of vibrational data collected from wireless sensors distributed on the structure. We consider a randomized spatial compression scheme for minimizing the amount of data that is collected and transmitted by the sensors. Using the recent technique of atomic norm minimization, we show that under certain conditions exact recovery of the mode shapes and frequencies is possible. In addition, in a simulation based on synthetic data, our method outperforms a singular value decomposition (SVD) based method for modal analysis that uses the uncompressed data set.
{"title":"Atomic norm minimization for modal analysis with random spatial compression","authors":"S. Li, Dehui Yang, M. Wakin","doi":"10.1109/ICASSP.2017.7952757","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952757","url":null,"abstract":"Identifying characteristic vibrational modes and frequencies is of great importance for monitoring the health of structures such as buildings and bridges. In this work, we address the problem of estimating the modal parameters of a structure from small amounts of vibrational data collected from wireless sensors distributed on the structure. We consider a randomized spatial compression scheme for minimizing the amount of data that is collected and transmitted by the sensors. Using the recent technique of atomic norm minimization, we show that under certain conditions exact recovery of the mode shapes and frequencies is possible. In addition, in a simulation based on synthetic data, our method outperforms a singular value decomposition (SVD) based method for modal analysis that uses the uncompressed data set.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125290959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-12DOI: 10.1109/ICASSP.2017.7952849
I. Clarkson
Sidelobe suppression has always been an important part of crafting communications signals to keep interference with users of adjacent spectrum to a minimum. Systems based on the discrete Fourier transform, such as orthogonal frequency-division multiplexing (OFDM) and single-carrier frequency-division multiple access (SC-FDMA) are especially prone to out-of-band power leakage. Although many techniques have been proposed to suppress sidelobes in DFT-based systems, a satisfactory balance between computational complexity and out-of-band power leakage has remained elusive.
{"title":"Orthogonal precoding for sidelobe suppression in DFT-based systems using block reflectors","authors":"I. Clarkson","doi":"10.1109/ICASSP.2017.7952849","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952849","url":null,"abstract":"Sidelobe suppression has always been an important part of crafting communications signals to keep interference with users of adjacent spectrum to a minimum. Systems based on the discrete Fourier transform, such as orthogonal frequency-division multiplexing (OFDM) and single-carrier frequency-division multiple access (SC-FDMA) are especially prone to out-of-band power leakage. Although many techniques have been proposed to suppress sidelobes in DFT-based systems, a satisfactory balance between computational complexity and out-of-band power leakage has remained elusive.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114447649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}