首页 > 最新文献

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Scalable Multilevel Quantization for Distributed Detection 分布式检测的可扩展多水平量化
Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414032
G. Gul, Michael Basler
A scalable algorithm is derived for multilevel quantization of sensor observations in distributed sensor networks, which consist of a number of sensors transmitting a summary information of their observations to the fusion center for a final decision. The proposed algorithm is directly minimizing the overall error probability of the network without resorting to minimizing pseudo objective functions such as distances between probability distributions. The problem formulation makes it possible to consider globally optimum error minimization at the fusion center and a person-by-person optimum quantization at each sensor. The complexity of the algorithm is quasi-linear for i.i.d. sensors. Experimental results indicate that the proposed scheme is superior in comparison to the current state-of-the-art.
在分布式传感器网络中,多个传感器向融合中心发送其观测值的汇总信息以进行最终决策,提出了一种可扩展的传感器观测值多级量化算法。该算法直接最小化网络的整体错误概率,而不需要最小化伪目标函数(如概率分布之间的距离)。该问题的表述可以考虑融合中心的全局最优误差最小化和每个传感器的逐人最优量化。该算法的复杂度是准线性的。实验结果表明,该方案优于目前的技术水平。
{"title":"Scalable Multilevel Quantization for Distributed Detection","authors":"G. Gul, Michael Basler","doi":"10.1109/ICASSP39728.2021.9414032","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414032","url":null,"abstract":"A scalable algorithm is derived for multilevel quantization of sensor observations in distributed sensor networks, which consist of a number of sensors transmitting a summary information of their observations to the fusion center for a final decision. The proposed algorithm is directly minimizing the overall error probability of the network without resorting to minimizing pseudo objective functions such as distances between probability distributions. The problem formulation makes it possible to consider globally optimum error minimization at the fusion center and a person-by-person optimum quantization at each sensor. The complexity of the algorithm is quasi-linear for i.i.d. sensors. Experimental results indicate that the proposed scheme is superior in comparison to the current state-of-the-art.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"5200-5204"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83677211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Linear Model-Based Intra Prediction in VVC Test Model 基于线性模型的VVC测试模型内预测
Pub Date : 2020-01-01 DOI: 10.1109/ICASSP40776.2020.9054405
R. G. Youvalari
{"title":"Linear Model-Based Intra Prediction in VVC Test Model","authors":"R. G. Youvalari","doi":"10.1109/ICASSP40776.2020.9054405","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054405","url":null,"abstract":"","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"4417-4421"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87087689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture 用于高阶声场捕获的实用同心开球心形麦克风阵列设计
P. ThomasMarkR
The problem of higher order sound field capture with spherical microphone arrays is considered. While A-format cardioid designs are commonplace for first order capture, interest remains in the increased spatial resolution delivered by higher order arrays. Spherical arrays typically use omnidirectional microphones mounted on a rigid baffle, from which higher order spatial components are estimated by accounting for radial mode strength. This produces a design trade-off between with small arrays for spatial aliasing performance and large arrays for reduced amplification of instrument noise at low frequencies. A practical open sphere design is proposed that contains cardioid microphones mounted at multiple radii to fulfill both criteria. A design example with a two spheres of 16-channel cardioids at 42 mm and 420 mm radius produces white noise gain above unity on third order components down to 200 Hz, a decade lower than a rigid 32-channel 42 mm sphere of omnidirectional microphones.
研究了球形传声器阵列的高阶声场捕获问题。虽然a格式的心形设计在一阶捕获中很常见,但人们仍然对高阶阵列提供的更高空间分辨率感兴趣。球形阵列通常使用安装在刚性挡板上的全向麦克风,通过计算径向模式强度来估计高阶空间分量。这就产生了在空间混叠性能的小阵列和降低低频仪器噪声放大的大阵列之间的设计权衡。提出了一种实用的开放球体设计,其中包含安装在多个半径上的心形麦克风,以满足这两个标准。在一个设计示例中,半径为42毫米和420毫米的两个16通道心形球体在三阶分量上产生的白噪声增益高于单位,低至200 Hz,比全向麦克风的32通道42毫米刚性球体低10倍。
{"title":"Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture","authors":"P. ThomasMarkR","doi":"10.1109/ICASSP.2019.8683559","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683559","url":null,"abstract":"The problem of higher order sound field capture with spherical microphone arrays is considered. While A-format cardioid designs are commonplace for first order capture, interest remains in the increased spatial resolution delivered by higher order arrays. Spherical arrays typically use omnidirectional microphones mounted on a rigid baffle, from which higher order spatial components are estimated by accounting for radial mode strength. This produces a design trade-off between with small arrays for spatial aliasing performance and large arrays for reduced amplification of instrument noise at low frequencies. A practical open sphere design is proposed that contains cardioid microphones mounted at multiple radii to fulfill both criteria. A design example with a two spheres of 16-channel cardioids at 42 mm and 420 mm radius produces white noise gain above unity on third order components down to 200 Hz, a decade lower than a rigid 32-channel 42 mm sphere of omnidirectional microphones.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"666-670"},"PeriodicalIF":0.0,"publicationDate":"2019-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88363317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Trigonometric Interpolation Beamforming for a Circular Microphone Array 圆形传声器阵列的三角插值波束形成
C. Schuldt
Polynomial beamforming has previously been proposed for addressing the non-trivial problem of integrating acoustic echo cancellation with adaptive microphone beamforming. This paper demonstrates a design example for a circular array where traditional polynomial beamforming approaches exhibit severe (over 10 dB) directivity index (DI) oscillations at the edges of the design interval, leading to severe DI degradation for certain look directions. A solution, based on trigonometric interpolation, is proposed that stabilizes the oscillations significantly, resulting in a DI that deviates only about 1 dB from that of a fixed beamformer over all look directions.
多项式波束形成先前被提出用于解决声学回波抵消与自适应麦克风波束形成相结合的非平凡问题。本文展示了一个圆形阵列的设计示例,其中传统的多项式波束形成方法在设计间隔的边缘表现出严重(超过10 dB)指向性指数(DI)振荡,导致某些外观方向的严重DI退化。提出了一种基于三角插值的解决方案,该方案可以显著地稳定振荡,从而使所有方向上的DI与固定波束形成器的DI仅偏离约1db。
{"title":"Trigonometric Interpolation Beamforming for a Circular Microphone Array","authors":"C. Schuldt","doi":"10.1109/ICASSP.2019.8682843","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682843","url":null,"abstract":"Polynomial beamforming has previously been proposed for addressing the non-trivial problem of integrating acoustic echo cancellation with adaptive microphone beamforming. This paper demonstrates a design example for a circular array where traditional polynomial beamforming approaches exhibit severe (over 10 dB) directivity index (DI) oscillations at the edges of the design interval, leading to severe DI degradation for certain look directions. A solution, based on trigonometric interpolation, is proposed that stabilizes the oscillations significantly, resulting in a DI that deviates only about 1 dB from that of a fixed beamformer over all look directions.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"143 1","pages":"431-435"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86748592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving ASR Robustness to Perturbed Speech Using Cycle-consistent Generative Adversarial Networks 利用周期一致生成对抗网络提高ASR对扰动语音的鲁棒性
Sri Harsha Dumpala, I. Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu
Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky voice datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.
由说话者的情绪和身体状态引起的音频信号中自然引入的扰动会显著降低自动语音识别(ASR)系统的性能。在本文中,我们提出了一种基于循环一致性生成对抗网络(CycleGAN)的前端,将自然扰动语音转换为正常语音,从而提高了ASR系统的鲁棒性。CycleGAN模型是在干扰和正常语音的非并行示例上训练的。在自发笑声语音和吱吱声数据集上的实验表明,与直接使用原始扰动语音相比,使用基于CycleGAN的前端语音可以提高四种不同的ASR系统的性能。对笑声干扰语音的特征和由所提出的前端产生的特征的可视化进一步证明了我们方法的有效性。
{"title":"Improving ASR Robustness to Perturbed Speech Using Cycle-consistent Generative Adversarial Networks","authors":"Sri Harsha Dumpala, I. Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu","doi":"10.1109/ICASSP.2019.8683793","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683793","url":null,"abstract":"Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky voice datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"5726-5730"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78971615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Blind Quality Evaluator for Screen Content Images via Analysis of Structure 基于结构分析的屏幕内容图像质量盲评价方法
Guanghui Yue, Chunping Hou, Weisi Lin
Existing blind evaluators for screen content images (SCIs) are mainly learning-based and require a number of training images with co-registered human opinion scores. However, the size of existing databases is small, and it is labor-, time-consuming and expensive to largely generate human opinion scores. In this study, we propose a novel blind quality evaluator without training. Specifically, the proposed method first calculates the gradient similarity between a distorted image and its translated versions in four directions to estimate the structural distortion, the most obvious distortion in SCIs. Given that the edge region is easier to be distorted, the inter-scale gradient similarity is then calculated as the weighting map. Finally, the proposed method is derived by incorporating the gradient similarity map with the weighting map. Experimental results demonstrate its effectiveness and efficiency on a public available SCI database.
现有的屏幕内容图像盲评价器主要是基于学习的,需要大量具有人类意见分数的训练图像。然而,现有数据库的规模很小,而且大量生成人类意见评分既费时又费力。在这项研究中,我们提出了一种新的盲式质量评估器。具体而言,该方法首先计算变形图像与翻译图像在四个方向上的梯度相似度,以估计图像中最明显的结构畸变。考虑到边缘区域更容易被扭曲,然后计算尺度间梯度相似度作为加权图。最后,将梯度相似图与加权图相结合,推导出该方法。实验结果证明了该方法在公共SCI数据库上的有效性和有效性。
{"title":"Blind Quality Evaluator for Screen Content Images via Analysis of Structure","authors":"Guanghui Yue, Chunping Hou, Weisi Lin","doi":"10.1109/ICASSP.2019.8682371","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682371","url":null,"abstract":"Existing blind evaluators for screen content images (SCIs) are mainly learning-based and require a number of training images with co-registered human opinion scores. However, the size of existing databases is small, and it is labor-, time-consuming and expensive to largely generate human opinion scores. In this study, we propose a novel blind quality evaluator without training. Specifically, the proposed method first calculates the gradient similarity between a distorted image and its translated versions in four directions to estimate the structural distortion, the most obvious distortion in SCIs. Given that the edge region is easier to be distorted, the inter-scale gradient similarity is then calculated as the weighting map. Finally, the proposed method is derived by incorporating the gradient similarity map with the weighting map. Experimental results demonstrate its effectiveness and efficiency on a public available SCI database.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"4050-4054"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84998965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Human Behaviour Recognition Using Wifi Channel State Information 利用Wifi信道状态信息进行人类行为识别
Daanish Ali Khan, Saquib Razak, B. Raj, Rita Singh
Device-Free Human Behaviour Recognition is automatically recognizing physical activity from a series of observations, without directly attaching sensors to the subject. Behaviour Recognition has applications in security, health-care, and smart homes. The ubiquity of WiFi devices has generated recent interest in Channel State Information (CSI) that describes the propagation of RF signals for behaviour recognition, leveraging the relationship between body movement and variations in CSI streams. Existing work on CSI based behaviour recognition has established the efficacy of deep neural network classifiers, yielding performance that surpasses traditional techniques. In this paper, we propose a deep Recurrent Neural Network (RNN) model for CSI based Behaviour Recognition that utilizes a Convolutional Neural Network (CNN) feature extractor with stacked Long Short-Term Memory (LSTM) networks for sequence classification. We also examine CSI de-noising techniques that allow faster training and model convergence. Our model has yielded significant improvement in classification accuracy, compared to existing techniques.
无需设备的人类行为识别是通过一系列观察自动识别身体活动,而无需直接在受试者身上安装传感器。行为识别在安全、医疗保健和智能家居领域都有应用。WiFi设备的无处不在最近引起了人们对信道状态信息(CSI)的兴趣,CSI描述了用于行为识别的射频信号的传播,利用了身体运动和CSI流变化之间的关系。基于CSI的行为识别的现有工作已经建立了深度神经网络分类器的有效性,产生了超越传统技术的性能。在本文中,我们提出了一种深度递归神经网络(RNN)模型用于基于CSI的行为识别,该模型利用卷积神经网络(CNN)特征提取器和堆叠长短期记忆(LSTM)网络进行序列分类。我们还研究了CSI去噪技术,它允许更快的训练和模型收敛。与现有技术相比,我们的模型在分类精度方面取得了显着提高。
{"title":"Human Behaviour Recognition Using Wifi Channel State Information","authors":"Daanish Ali Khan, Saquib Razak, B. Raj, Rita Singh","doi":"10.1109/ICASSP.2019.8682821","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682821","url":null,"abstract":"Device-Free Human Behaviour Recognition is automatically recognizing physical activity from a series of observations, without directly attaching sensors to the subject. Behaviour Recognition has applications in security, health-care, and smart homes. The ubiquity of WiFi devices has generated recent interest in Channel State Information (CSI) that describes the propagation of RF signals for behaviour recognition, leveraging the relationship between body movement and variations in CSI streams. Existing work on CSI based behaviour recognition has established the efficacy of deep neural network classifiers, yielding performance that surpasses traditional techniques. In this paper, we propose a deep Recurrent Neural Network (RNN) model for CSI based Behaviour Recognition that utilizes a Convolutional Neural Network (CNN) feature extractor with stacked Long Short-Term Memory (LSTM) networks for sequence classification. We also examine CSI de-noising techniques that allow faster training and model convergence. Our model has yielded significant improvement in classification accuracy, compared to existing techniques.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"91 1","pages":"7625-7629"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83267319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Novel Deterministic Sensing Matrix Based on Kasami Codes for Cluster Structured Sparse Signals 基于Kasami码的聚类结构稀疏信号确定性感知矩阵
Hamid Nouasria, Mohamed Et-tolba
Cluster structured compressive sensing is a new direction of compressive sensing, dealing with cluster structured sparse signals. In this paper, we propose a sensing matrix based on Kasami codes for CSS signals. The Kasami codes have been the subject of several constructions. Our idea is to make these constructions suitable to CSS signals. The proposed matrix, gives more intention to the clusters. Simulation results show the superior performance of our matrix. In that, it gives the highest rate of exact recovery. Moreover, the deterministic aspect of our matrix makes it more suitable to be implemented on hardware.
聚类结构压缩感知是处理聚类结构稀疏信号的压缩感知的一个新方向。本文提出了一种基于Kasami码的CSS信号感知矩阵。Kasami规范已经成为几个建筑的主题。我们的想法是使这些结构适合CSS信号。所提出的矩阵赋予了聚类更多的意图。仿真结果表明了该矩阵的优越性能。因此,它提供了最高的精确回收率。此外,我们的矩阵的确定性方面使其更适合在硬件上实现。
{"title":"A Novel Deterministic Sensing Matrix Based on Kasami Codes for Cluster Structured Sparse Signals","authors":"Hamid Nouasria, Mohamed Et-tolba","doi":"10.1109/ICASSP.2019.8683593","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683593","url":null,"abstract":"Cluster structured compressive sensing is a new direction of compressive sensing, dealing with cluster structured sparse signals. In this paper, we propose a sensing matrix based on Kasami codes for CSS signals. The Kasami codes have been the subject of several constructions. Our idea is to make these constructions suitable to CSS signals. The proposed matrix, gives more intention to the clusters. Simulation results show the superior performance of our matrix. In that, it gives the highest rate of exact recovery. Moreover, the deterministic aspect of our matrix makes it more suitable to be implemented on hardware.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"53 1","pages":"1592-1596"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81173424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Embedding Physical Augmentation and Wavelet Scattering Transform to Generative Adversarial Networks for Audio Classification with Limited Training Resources 在训练资源有限的音频分类中嵌入物理增强和小波散射变换的生成对抗网络
Kah Kuan Teh, T. H. Dat
This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN method to embed physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources. It could lift up classification accuracy from the best baselines of 62.06% and 77.29% on ResNet, to as far as 91.96% and 93.38%, when training with 10% and 25% training data, respectively.
本文研究了训练资源有限的音频分类问题。我们首先研究了不同类型的数据增强,包括物理建模,小波散射变换和生成对抗网络(GAN)。然后,我们提出了一种新的GAN方法,将物理增强和小波散射变换嵌入到处理中。在Google Speech Command上的实验结果表明,在资源有限的情况下,本文提出的方法有明显的改进。当使用10%和25%的训练数据进行训练时,可以将分类准确率从ResNet上的最佳基线62.06%和77.29%提高到91.96%和93.38%。
{"title":"Embedding Physical Augmentation and Wavelet Scattering Transform to Generative Adversarial Networks for Audio Classification with Limited Training Resources","authors":"Kah Kuan Teh, T. H. Dat","doi":"10.1109/ICASSP.2019.8683199","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683199","url":null,"abstract":"This paper addresses audio classification with limited training resources. We first investigate different types of data augmentation including physical modeling, wavelet scattering transform and Generative Adversarial Networks (GAN). We than propose a novel GAN method to embed physical augmentation and wavelet scattering transform in processing. The experimental results on Google Speech Command show significant improvements of the proposed method when training with limited resources. It could lift up classification accuracy from the best baselines of 62.06% and 77.29% on ResNet, to as far as 91.96% and 93.38%, when training with 10% and 25% training data, respectively.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"3262-3266"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75273590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Joint parameter and state estimation for wave-based imaging and inversion 基于波的成像与反演联合参数与状态估计
T. Leeuwen
In many applications, such as exploration geophysics, seismology and ultrasound imaging, waves are harnessed to image the interior of an object. We can pose the image formation process as a non-linear data-fitting problem: fit the coefficients of a wave-equation such that its solution fits the observations approximately. This allows one to effectively deal with errors in the observations.
在勘探地球物理、地震学和超声成像等许多应用中,利用波对物体内部进行成像。我们可以把成像过程看作一个非线性数据拟合问题:拟合波动方程的系数,使其解与观测值近似拟合。这使得人们可以有效地处理观测中的误差。
{"title":"Joint parameter and state estimation for wave-based imaging and inversion","authors":"T. Leeuwen","doi":"10.1109/ICASSP.2017.7953350","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953350","url":null,"abstract":"In many applications, such as exploration geophysics, seismology and ultrasound imaging, waves are harnessed to image the interior of an object. We can pose the image formation process as a non-linear data-fitting problem: fit the coefficients of a wave-equation such that its solution fits the observations approximately. This allows one to effectively deal with errors in the observations.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"348 1","pages":"6210-6214"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77625345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1