Pub Date : 2021-06-06DOI: 10.1109/ICASSP39728.2021.9414032
G. Gul, Michael Basler
A scalable algorithm is derived for multilevel quantization of sensor observations in distributed sensor networks, which consist of a number of sensors transmitting a summary information of their observations to the fusion center for a final decision. The proposed algorithm is directly minimizing the overall error probability of the network without resorting to minimizing pseudo objective functions such as distances between probability distributions. The problem formulation makes it possible to consider globally optimum error minimization at the fusion center and a person-by-person optimum quantization at each sensor. The complexity of the algorithm is quasi-linear for i.i.d. sensors. Experimental results indicate that the proposed scheme is superior in comparison to the current state-of-the-art.
{"title":"Scalable Multilevel Quantization for Distributed Detection","authors":"G. Gul, Michael Basler","doi":"10.1109/ICASSP39728.2021.9414032","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414032","url":null,"abstract":"A scalable algorithm is derived for multilevel quantization of sensor observations in distributed sensor networks, which consist of a number of sensors transmitting a summary information of their observations to the fusion center for a final decision. The proposed algorithm is directly minimizing the overall error probability of the network without resorting to minimizing pseudo objective functions such as distances between probability distributions. The problem formulation makes it possible to consider globally optimum error minimization at the fusion center and a person-by-person optimum quantization at each sensor. The complexity of the algorithm is quasi-linear for i.i.d. sensors. Experimental results indicate that the proposed scheme is superior in comparison to the current state-of-the-art.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"5200-5204"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83677211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-01DOI: 10.1109/ICASSP40776.2020.9054405
R. G. Youvalari
{"title":"Linear Model-Based Intra Prediction in VVC Test Model","authors":"R. G. Youvalari","doi":"10.1109/ICASSP40776.2020.9054405","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054405","url":null,"abstract":"","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"4417-4421"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87087689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-07DOI: 10.1109/ICASSP.2019.8683559
P. ThomasMarkR
The problem of higher order sound field capture with spherical microphone arrays is considered. While A-format cardioid designs are commonplace for first order capture, interest remains in the increased spatial resolution delivered by higher order arrays. Spherical arrays typically use omnidirectional microphones mounted on a rigid baffle, from which higher order spatial components are estimated by accounting for radial mode strength. This produces a design trade-off between with small arrays for spatial aliasing performance and large arrays for reduced amplification of instrument noise at low frequencies. A practical open sphere design is proposed that contains cardioid microphones mounted at multiple radii to fulfill both criteria. A design example with a two spheres of 16-channel cardioids at 42 mm and 420 mm radius produces white noise gain above unity on third order components down to 200 Hz, a decade lower than a rigid 32-channel 42 mm sphere of omnidirectional microphones.
{"title":"Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture","authors":"P. ThomasMarkR","doi":"10.1109/ICASSP.2019.8683559","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683559","url":null,"abstract":"The problem of higher order sound field capture with spherical microphone arrays is considered. While A-format cardioid designs are commonplace for first order capture, interest remains in the increased spatial resolution delivered by higher order arrays. Spherical arrays typically use omnidirectional microphones mounted on a rigid baffle, from which higher order spatial components are estimated by accounting for radial mode strength. This produces a design trade-off between with small arrays for spatial aliasing performance and large arrays for reduced amplification of instrument noise at low frequencies. A practical open sphere design is proposed that contains cardioid microphones mounted at multiple radii to fulfill both criteria. A design example with a two spheres of 16-channel cardioids at 42 mm and 420 mm radius produces white noise gain above unity on third order components down to 200 Hz, a decade lower than a rigid 32-channel 42 mm sphere of omnidirectional microphones.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"666-670"},"PeriodicalIF":0.0,"publicationDate":"2019-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88363317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8682843
C. Schuldt
Polynomial beamforming has previously been proposed for addressing the non-trivial problem of integrating acoustic echo cancellation with adaptive microphone beamforming. This paper demonstrates a design example for a circular array where traditional polynomial beamforming approaches exhibit severe (over 10 dB) directivity index (DI) oscillations at the edges of the design interval, leading to severe DI degradation for certain look directions. A solution, based on trigonometric interpolation, is proposed that stabilizes the oscillations significantly, resulting in a DI that deviates only about 1 dB from that of a fixed beamformer over all look directions.
{"title":"Trigonometric Interpolation Beamforming for a Circular Microphone Array","authors":"C. Schuldt","doi":"10.1109/ICASSP.2019.8682843","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682843","url":null,"abstract":"Polynomial beamforming has previously been proposed for addressing the non-trivial problem of integrating acoustic echo cancellation with adaptive microphone beamforming. This paper demonstrates a design example for a circular array where traditional polynomial beamforming approaches exhibit severe (over 10 dB) directivity index (DI) oscillations at the edges of the design interval, leading to severe DI degradation for certain look directions. A solution, based on trigonometric interpolation, is proposed that stabilizes the oscillations significantly, resulting in a DI that deviates only about 1 dB from that of a fixed beamformer over all look directions.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"143 1","pages":"431-435"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86748592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8683793
Sri Harsha Dumpala, I. Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu
Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky voice datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.
{"title":"Improving ASR Robustness to Perturbed Speech Using Cycle-consistent Generative Adversarial Networks","authors":"Sri Harsha Dumpala, I. Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu","doi":"10.1109/ICASSP.2019.8683793","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683793","url":null,"abstract":"Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky voice datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"5726-5730"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78971615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8682371
Guanghui Yue, Chunping Hou, Weisi Lin
Existing blind evaluators for screen content images (SCIs) are mainly learning-based and require a number of training images with co-registered human opinion scores. However, the size of existing databases is small, and it is labor-, time-consuming and expensive to largely generate human opinion scores. In this study, we propose a novel blind quality evaluator without training. Specifically, the proposed method first calculates the gradient similarity between a distorted image and its translated versions in four directions to estimate the structural distortion, the most obvious distortion in SCIs. Given that the edge region is easier to be distorted, the inter-scale gradient similarity is then calculated as the weighting map. Finally, the proposed method is derived by incorporating the gradient similarity map with the weighting map. Experimental results demonstrate its effectiveness and efficiency on a public available SCI database.
{"title":"Blind Quality Evaluator for Screen Content Images via Analysis of Structure","authors":"Guanghui Yue, Chunping Hou, Weisi Lin","doi":"10.1109/ICASSP.2019.8682371","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682371","url":null,"abstract":"Existing blind evaluators for screen content images (SCIs) are mainly learning-based and require a number of training images with co-registered human opinion scores. However, the size of existing databases is small, and it is labor-, time-consuming and expensive to largely generate human opinion scores. In this study, we propose a novel blind quality evaluator without training. Specifically, the proposed method first calculates the gradient similarity between a distorted image and its translated versions in four directions to estimate the structural distortion, the most obvious distortion in SCIs. Given that the edge region is easier to be distorted, the inter-scale gradient similarity is then calculated as the weighting map. Finally, the proposed method is derived by incorporating the gradient similarity map with the weighting map. Experimental results demonstrate its effectiveness and efficiency on a public available SCI database.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"4050-4054"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84998965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8682821
Daanish Ali Khan, Saquib Razak, B. Raj, Rita Singh
Device-Free Human Behaviour Recognition is automatically recognizing physical activity from a series of observations, without directly attaching sensors to the subject. Behaviour Recognition has applications in security, health-care, and smart homes. The ubiquity of WiFi devices has generated recent interest in Channel State Information (CSI) that describes the propagation of RF signals for behaviour recognition, leveraging the relationship between body movement and variations in CSI streams. Existing work on CSI based behaviour recognition has established the efficacy of deep neural network classifiers, yielding performance that surpasses traditional techniques. In this paper, we propose a deep Recurrent Neural Network (RNN) model for CSI based Behaviour Recognition that utilizes a Convolutional Neural Network (CNN) feature extractor with stacked Long Short-Term Memory (LSTM) networks for sequence classification. We also examine CSI de-noising techniques that allow faster training and model convergence. Our model has yielded significant improvement in classification accuracy, compared to existing techniques.
{"title":"Human Behaviour Recognition Using Wifi Channel State Information","authors":"Daanish Ali Khan, Saquib Razak, B. Raj, Rita Singh","doi":"10.1109/ICASSP.2019.8682821","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682821","url":null,"abstract":"Device-Free Human Behaviour Recognition is automatically recognizing physical activity from a series of observations, without directly attaching sensors to the subject. Behaviour Recognition has applications in security, health-care, and smart homes. The ubiquity of WiFi devices has generated recent interest in Channel State Information (CSI) that describes the propagation of RF signals for behaviour recognition, leveraging the relationship between body movement and variations in CSI streams. Existing work on CSI based behaviour recognition has established the efficacy of deep neural network classifiers, yielding performance that surpasses traditional techniques. In this paper, we propose a deep Recurrent Neural Network (RNN) model for CSI based Behaviour Recognition that utilizes a Convolutional Neural Network (CNN) feature extractor with stacked Long Short-Term Memory (LSTM) networks for sequence classification. We also examine CSI de-noising techniques that allow faster training and model convergence. Our model has yielded significant improvement in classification accuracy, compared to existing techniques.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"91 1","pages":"7625-7629"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83267319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8683593
Hamid Nouasria, Mohamed Et-tolba
Cluster structured compressive sensing is a new direction of compressive sensing, dealing with cluster structured sparse signals. In this paper, we propose a sensing matrix based on Kasami codes for CSS signals. The Kasami codes have been the subject of several constructions. Our idea is to make these constructions suitable to CSS signals. The proposed matrix, gives more intention to the clusters. Simulation results show the superior performance of our matrix. In that, it gives the highest rate of exact recovery. Moreover, the deterministic aspect of our matrix makes it more suitable to be implemented on hardware.
{"title":"A Novel Deterministic Sensing Matrix Based on Kasami Codes for Cluster Structured Sparse Signals","authors":"Hamid Nouasria, Mohamed Et-tolba","doi":"10.1109/ICASSP.2019.8683593","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683593","url":null,"abstract":"Cluster structured compressive sensing is a new direction of compressive sensing, dealing with cluster structured sparse signals. In this paper, we propose a sensing matrix based on Kasami codes for CSS signals. The Kasami codes have been the subject of several constructions. Our idea is to make these constructions suitable to CSS signals. The proposed matrix, gives more intention to the clusters. Simulation results show the superior performance of our matrix. In that, it gives the highest rate of exact recovery. Moreover, the deterministic aspect of our matrix makes it more suitable to be implemented on hardware.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"53 1","pages":"1592-1596"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81173424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-05-01DOI: 10.1109/ICASSP.2019.8683199
Kah Kuan Teh, T. H. Dat
This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN method to embed physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources. It could lift up classification accuracy from the best baselines of 62.06% and 77.29% on ResNet, to as far as 91.96% and 93.38%, when training with 10% and 25% training data, respectively.
{"title":"Embedding Physical Augmentation and Wavelet Scattering Transform to Generative Adversarial Networks for Audio Classification with Limited Training Resources","authors":"Kah Kuan Teh, T. H. Dat","doi":"10.1109/ICASSP.2019.8683199","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683199","url":null,"abstract":"This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN method to embed physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources. It could lift up classification accuracy from the best baselines of 62.06% and 77.29% on ResNet, to as far as 91.96% and 93.38%, when training with 10% and 25% training data, respectively.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"3262-3266"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75273590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-01DOI: 10.1109/ICASSP.2017.7953350
T. Leeuwen
In many applications, such as exploration geophysics, seismology and ultrasound imaging, waves are harnessed to image the interior of an object. We can pose the image formation process as a non-linear data-fitting problem: fit the coefficients of a wave-equation such that its solution fits the observations approximately. This allows one to effectively deal with errors in the observations.
{"title":"Joint parameter and state estimation for wave-based imaging and inversion","authors":"T. Leeuwen","doi":"10.1109/ICASSP.2017.7953350","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953350","url":null,"abstract":"In many applications, such as exploration geophysics, seismology and ultrasound imaging, waves are harnessed to image the interior of an object. We can pose the image formation process as a non-linear data-fitting problem: fit the coefficients of a wave-equation such that its solution fits the observations approximately. This allows one to effectively deal with errors in the observations.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"348 1","pages":"6210-6214"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77625345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}