Pub Date : 2017-03-01DOI: 10.1109/ICASSP.2017.7952993
Maksim Butsenko, Johan Sward, A. Jakobsson
In this paper, we present a technique for reducing the size of the dictionary in sparse signal reconstruction by formulating an initial dictionary containing elements that spans bands of the considered parameter space. We allow for the use of this banded dictionary in a first-stage estimation procedure, in which large parts of the parameter space is discarded for further analysis, thereby reducing the overall computationally complexity required to allow for a reliable signal reconstruction. We illustrate the presented principle on the problem of estimating sinusoidal components corrupted by white noise.
{"title":"Estimating sparse signals using integrated wide-band dictionaries","authors":"Maksim Butsenko, Johan Sward, A. Jakobsson","doi":"10.1109/ICASSP.2017.7952993","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952993","url":null,"abstract":"In this paper, we present a technique for reducing the size of the dictionary in sparse signal reconstruction by formulating an initial dictionary containing elements that spans bands of the considered parameter space. We allow for the use of this banded dictionary in a first-stage estimation procedure, in which large parts of the parameter space is discarded for further analysis, thereby reducing the overall computationally complexity required to allow for a reliable signal reconstruction. We illustrate the presented principle on the problem of estimating sinusoidal components corrupted by white noise.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4426-4430"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78338591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-01DOI: 10.1109/ICASSP.2017.7952997
Seyedehsara Nayer, Namrata Vaswani, Yonina C. Eldar
We study the problem of recovering a low-rank matrix, X, from phaseless measurements of random linear projections of its columns. We develop a novel solution approach, called AltMinTrunc, that consists of a two-step truncated spectral initialization step, followed by a three-step alternating minimization algorithm. We obtain sample complexity bounds for the AltMinTrunc initialization to provide a good approximation of the true X. When the rank of X is low enough, these are significantly smaller than what existing single vector phase retrieval algorithms need. Via extensive experiments, we demonstrate the same for the entire algorithm.
{"title":"Low rank phase retrieval","authors":"Seyedehsara Nayer, Namrata Vaswani, Yonina C. Eldar","doi":"10.1109/ICASSP.2017.7952997","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952997","url":null,"abstract":"We study the problem of recovering a low-rank matrix, X, from phaseless measurements of random linear projections of its columns. We develop a novel solution approach, called AltMinTrunc, that consists of a two-step truncated spectral initialization step, followed by a three-step alternating minimization algorithm. We obtain sample complexity bounds for the AltMinTrunc initialization to provide a good approximation of the true X. When the rank of X is low enough, these are significantly smaller than what existing single vector phase retrieval algorithms need. Via extensive experiments, we demonstrate the same for the entire algorithm.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"4446-4450"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80910499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471962
Chang Su, Li Tao
In this paper, we propose a fast and dictionary-free example-based super-resolution (EBSR) algorithm to solve the contradiction in EBSR methods of their high performance in achieving high visual quality and their low efficiency and high costs. With a novel cross-scale high-frequency components (HFC) self-learning strategy, the missed HFC of a high-resolution (HR) image are approximated from its low-resolution counterparts. A high-quality estimation of the HR image is thus obtained by compensating the HFC to its initial guess. Simulations show that the proposed algorithm gets comparable results to the state-of-the-art EBSR but with much higher efficiency and lower costs.
{"title":"A real-time example-based single-image super-resolution algorithm via cross-scale high-frequency components self-learning","authors":"Chang Su, Li Tao","doi":"10.1109/ICASSP.2016.7471962","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471962","url":null,"abstract":"In this paper, we propose a fast and dictionary-free example-based super-resolution (EBSR) algorithm to solve the contradiction in EBSR methods of their high performance in achieving high visual quality and their low efficiency and high costs. With a novel cross-scale high-frequency components (HFC) self-learning strategy, the missed HFC of a high-resolution (HR) image are approximated from its low-resolution counterparts. A high-quality estimation of the HR image is thus obtained by compensating the HFC to its initial guess. Simulations show that the proposed algorithm gets comparable results to the state-of-the-art EBSR but with much higher efficiency and lower costs.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"1676-1680"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81532488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-19DOI: 10.1109/ICASSP.2015.7178453
Vincent Mohammad Tavakoliy, Jesper Rindom Jenseny, Mads Graesboll Christenseny, Jacob Benestyz
Speech enhancement with distributed arrays has been met with various methods. On the one hand, data independent methods require information about the position of sensors, so they are not suitable for dynamic geometries. On the other hand, Wiener-based methods cannot assure a distortionless output. This paper proposes minimum variance distortionless response filtering based on multichannel pseudo-coherence for speech enhancement with ad hoc microphone arrays. This method requires neither position information nor control of the trade-off used in the distortion weighted methods. Furthermore, certain performance criteria are derived in terms of the pseudo-coherence vector, and the method is compared with the multichannel Wiener filter. Evaluation shows the suitability of the proposed method in terms of noise reduction with minimum distortion in ad hoc scenarios.
{"title":"Pseudo-coherence-based MVDR beamformer for speech enhancement with ad hoc microphone arrays","authors":"Vincent Mohammad Tavakoliy, Jesper Rindom Jenseny, Mads Graesboll Christenseny, Jacob Benestyz","doi":"10.1109/ICASSP.2015.7178453","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178453","url":null,"abstract":"Speech enhancement with distributed arrays has been met with various methods. On the one hand, data independent methods require information about the position of sensors, so they are not suitable for dynamic geometries. On the other hand, Wiener-based methods cannot assure a distortionless output. This paper proposes minimum variance distortionless response filtering based on multichannel pseudo-coherence for speech enhancement with ad hoc microphone arrays. This method requires neither position information nor control of the trade-off used in the distortion weighted methods. Furthermore, certain performance criteria are derived in terms of the pseudo-coherence vector, and the method is compared with the multichannel Wiener filter. Evaluation shows the suitability of the proposed method in terms of noise reduction with minimum distortion in ad hoc scenarios.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"2659-2663"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81468499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-19DOI: 10.1109/ICASSP.2015.7179031
Monica F. Bugalloy, Angela M. Kellyz
We report on a university-based pilot initiative to introduce students in grades 9–12 to electrical engineering practices. The after-school program consisted of two modules of four two-hour sessions and targeted students from two different local schools. They were exposed to hands-on electronic activities as well as programming practices related to image processing. The data collected from weekly surveys revealed that students found the program more challenging and engaging as the course progressed and they were motivated to pursue future engineering study. Additional schools in the region have requested the opportunity for their students to participate in the program at the university.
{"title":"An outreach after-school program to introduce high-school students to electrical engineering","authors":"Monica F. Bugalloy, Angela M. Kellyz","doi":"10.1109/ICASSP.2015.7179031","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7179031","url":null,"abstract":"We report on a university-based pilot initiative to introduce students in grades 9–12 to electrical engineering practices. The after-school program consisted of two modules of four two-hour sessions and targeted students from two different local schools. They were exposed to hands-on electronic activities as well as programming practices related to image processing. The data collected from weekly surveys revealed that students found the program more challenging and engaging as the course progressed and they were motivated to pursue future engineering study. Additional schools in the region have requested the opportunity for their students to participate in the program at the university.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"5540-5544"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73801141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854221
Laming Chen, Yuantao Gu
Existing literatures suggest that sparsity is more likely to be induced with non-convex penalties, but the corresponding algorithms usually suffer from multiple local minima. In this paper, we introduce a class of sparsity-inducing penalties and provide the convergence guarantees of a non-convex approach for sparse recovery using regularized least squares. Theoretical analysis demonstrates that under some certain conditions, if the non-convexity of the penalty is below a threshold (which is in inverse proportion to the distance between the initialization and the sparse signal), the sparse signal can be stably recovered. Numerical simulations are implemented to verify the theoretical results in this paper and to compare the performance of this approach with other references.
{"title":"The convergence guarantees of a non-convex approach for sparse recovery using regularized least squares","authors":"Laming Chen, Yuantao Gu","doi":"10.1109/ICASSP.2014.6854221","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854221","url":null,"abstract":"Existing literatures suggest that sparsity is more likely to be induced with non-convex penalties, but the corresponding algorithms usually suffer from multiple local minima. In this paper, we introduce a class of sparsity-inducing penalties and provide the convergence guarantees of a non-convex approach for sparse recovery using regularized least squares. Theoretical analysis demonstrates that under some certain conditions, if the non-convexity of the penalty is below a threshold (which is in inverse proportion to the distance between the initialization and the sparse signal), the sparse signal can be stably recovered. Numerical simulations are implemented to verify the theoretical results in this paper and to compare the performance of this approach with other references.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"162 1","pages":"3350-3354"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73486620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853569
P. Gay, E. Khoury, S. Meignier, J. Odobez, P. Deléglise
We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,...) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.
{"title":"A conditional random field approach for audio-visual people diarization","authors":"P. Gay, E. Khoury, S. Meignier, J. Odobez, P. Deléglise","doi":"10.1109/ICASSP.2014.6853569","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853569","url":null,"abstract":"We investigate the problem of audio-visual (AV) person diarization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,...) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"116-120"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87983603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854062
K. Shirota, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda
A statistical parametric approach to singing voice synthesis based on hidden Markov Models (HMMs) has been growing in popularity over the last few years. The spectrum, excitation, vibrato, and duration of singing voices in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. HMM-based singing voice synthesis systems are heavily based on the training data in performance because these systems are “corpus-based.” Therefore, HMMs corresponding to contextual factors that hardly ever appear in the training data cannot be well-trained. Pitch should especially be correctly covered since generated F0 trajectories have a great impact on the subjective quality of synthesized singing voices. We applied the method of “speaker adaptive training” (SAT) to “pitch adaptive training,” which is discussed in this paper. This technique made it possible to normalize pitch based on musical notes in the training process. The experimental results demonstrated that the proposed technique could alleviate the data sparseness problem.
{"title":"Pitch adaptive training for hmm-based singing voice synthesis","authors":"K. Shirota, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/ICASSP.2014.6854062","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854062","url":null,"abstract":"A statistical parametric approach to singing voice synthesis based on hidden Markov Models (HMMs) has been growing in popularity over the last few years. The spectrum, excitation, vibrato, and duration of singing voices in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. HMM-based singing voice synthesis systems are heavily based on the training data in performance because these systems are “corpus-based.” Therefore, HMMs corresponding to contextual factors that hardly ever appear in the training data cannot be well-trained. Pitch should especially be correctly covered since generated F0 trajectories have a great impact on the subjective quality of synthesized singing voices. We applied the method of “speaker adaptive training” (SAT) to “pitch adaptive training,” which is discussed in this paper. This technique made it possible to normalize pitch based on musical notes in the training process. The experimental results demonstrated that the proposed technique could alleviate the data sparseness problem.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"5377-5380"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86653818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-01-01DOI: 10.1109/ICASSP.2014.6853825
Yuhong Yang, Hongjiang Yu, R. Hu, Li Gao, Song Wang, Qing Zhai, Songbo Xie
{"title":"Auditory attention based mobile audio quality assessment","authors":"Yuhong Yang, Hongjiang Yu, R. Hu, Li Gao, Song Wang, Qing Zhai, Songbo Xie","doi":"10.1109/ICASSP.2014.6853825","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853825","url":null,"abstract":"","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 1","pages":"1389-1393"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73749375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-05-26DOI: 10.1109/ICASSP.2013.6638620
Junting Chen, V. Lau
Multi-user multi-input-multi-output (MU-MIMO) systems usually require users to feedback the channel state information (CSI) for scheduling. Most of the existing literature on the reduced feedback user scheduling focused on the throughput performance and the queueing delay was usually ignored. As the delay is important for real-time applications, it is desirable to have a low feedback queue-aware user scheduling algorithm for MU-MIMO systems. This paper proposes a two timescale queue-aware user scheduling algorithm, which consists of a queue-aware mobile-driven feedback filtering stage and a SINR-based user scheduling stage. The feedback policy is obtained by solving a queue-weighted optimization problem. In addition, we evaluate the associated queueing delay performance by using the large deviation analysis. The large deviation decay rate for the proposed algorithm is shown to be much larger than the CSI-only scheduling algorithm. Numerical results demonstrate the large performance gain of the proposed algorithm over the CSI-only algorithm, while the proposed one requires only a small amount of feedback.
{"title":"Large deviation delay analysis of queue-aware multi-user MIMO systems with two timescale mobile-driven feedback","authors":"Junting Chen, V. Lau","doi":"10.1109/ICASSP.2013.6638620","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638620","url":null,"abstract":"Multi-user multi-input-multi-output (MU-MIMO) systems usually require users to feedback the channel state information (CSI) for scheduling. Most of the existing literature on the reduced feedback user scheduling focused on the throughput performance and the queueing delay was usually ignored. As the delay is important for real-time applications, it is desirable to have a low feedback queue-aware user scheduling algorithm for MU-MIMO systems. This paper proposes a two timescale queue-aware user scheduling algorithm, which consists of a queue-aware mobile-driven feedback filtering stage and a SINR-based user scheduling stage. The feedback policy is obtained by solving a queue-weighted optimization problem. In addition, we evaluate the associated queueing delay performance by using the large deviation analysis. The large deviation decay rate for the proposed algorithm is shown to be much larger than the CSI-only scheduling algorithm. Numerical results demonstrate the large performance gain of the proposed algorithm over the CSI-only algorithm, while the proposed one requires only a small amount of feedback.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"89 1","pages":"5036-5040"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84232648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}