首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Shift-invariant Subspace Tracking with Missing Data 缺失数据的平移不变子空间跟踪
Myung Cho, Yuejie Chi
Subspace tracking is an important problem in signal processing that finds applications in wireless communications, video surveillance, and source localization in radar and sonar. In recent years, it is recognized that a low-dimensional subspace can be estimated and tracked reliably even when the data vectors are partially observed with many missing entries, which is greatly desirable when processing high-dimensional and high-rate data to reduce the sampling requirement. This paper is motivated by the observation that the underlying low-dimensional subspace may possess additional structural properties induced by the physical model of data, which if harnessed properly, can greatly improve subspace tracking performance. As a case study, this paper investigates the problem of tracking direction-of-arrivals from subsampled observations in a unitary linear array, where the signals lie in a subspace spanned by columns of a Vandermonde matrix. We exploit the shift-invariant structure by mapping the data vector to a latent Hankel matrix, and then perform tracking over the Hankel matrices by exploiting their low-rank properties. Numerical simulations are conducted to validate the superiority of the proposed approach over existing subspace tracking methods that do not exploit the additional shift-invariant structure in terms of tracking speed and agility.
子空间跟踪是信号处理中的一个重要问题,在无线通信、视频监控、雷达和声纳源定位等领域都有广泛的应用。近年来,人们认识到即使在数据向量部分观测到且有许多缺失条目的情况下,也可以可靠地估计和跟踪低维子空间,这在处理高维、高速率数据以减少采样需求时是非常可取的。本文的动机是观察到底层的低维子空间可能具有由数据的物理模型引起的额外结构特性,如果利用得当,可以大大提高子空间的跟踪性能。作为一个案例研究,本文研究了在幺正线性阵列中,信号位于由Vandermonde矩阵的列所张成的子空间中,从次采样观测中跟踪到达方向的问题。我们通过将数据向量映射到潜在的Hankel矩阵来利用平移不变结构,然后通过利用其低秩性质对Hankel矩阵进行跟踪。通过数值仿真验证了该方法在跟踪速度和敏捷性方面优于现有的不利用附加移位不变结构的子空间跟踪方法。
{"title":"Shift-invariant Subspace Tracking with Missing Data","authors":"Myung Cho, Yuejie Chi","doi":"10.1109/ICASSP.2019.8683025","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683025","url":null,"abstract":"Subspace tracking is an important problem in signal processing that finds applications in wireless communications, video surveillance, and source localization in radar and sonar. In recent years, it is recognized that a low-dimensional subspace can be estimated and tracked reliably even when the data vectors are partially observed with many missing entries, which is greatly desirable when processing high-dimensional and high-rate data to reduce the sampling requirement. This paper is motivated by the observation that the underlying low-dimensional subspace may possess additional structural properties induced by the physical model of data, which if harnessed properly, can greatly improve subspace tracking performance. As a case study, this paper investigates the problem of tracking direction-of-arrivals from subsampled observations in a unitary linear array, where the signals lie in a subspace spanned by columns of a Vandermonde matrix. We exploit the shift-invariant structure by mapping the data vector to a latent Hankel matrix, and then perform tracking over the Hankel matrices by exploiting their low-rank properties. Numerical simulations are conducted to validate the superiority of the proposed approach over existing subspace tracking methods that do not exploit the additional shift-invariant structure in terms of tracking speed and agility.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"8222-8225"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81207682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages? 卷积神经网络声学模型的特征跨语言可转移性如何?
J. Thompson, M. Schönwiesner, Yoshua Bengio, D. Willett
Characterization of the representations learned in intermediate layers of deep networks can provide valuable insight into the nature of a task and can guide the development of well-tailored learning strategies. Here we study convolutional neural network (CNN)-based acoustic models in the context of automatic speech recognition. Adapting a method proposed by [1], we measure the transferability of each layer between English, Dutch and German to assess their language-specificity. We observed three distinct regions of transferability: (1) the first two layers were entirely transferable between languages, (2) layers 2–8 were also highly transferable but we found some evidence of language specificity, (3) the subsequent fully connected layers were more language specific but could be successfully finetuned to the target language. To further probe the effect of weight freezing, we performed follow-up experiments using freeze-training [2]. Our results are consistent with the observation that CNNs converge ‘bottom up’ during training and demonstrate the benefit of freeze training, especially for transfer learning.
在深度网络的中间层中学习表征可以提供对任务性质的有价值的见解,并可以指导量身定制的学习策略的发展。本文研究了基于卷积神经网络(CNN)的声学模型在自动语音识别中的应用。采用[1]提出的方法,我们测量了英语、荷兰语和德语之间每一层的可转移性,以评估它们的语言特异性。我们观察到三个不同的可转移性区域:(1)前两层在语言之间完全可转移;(2)第2 - 8层也具有高度可转移性,但我们发现了一些语言特异性的证据;(3)随后的完全连接层更具有语言特异性,但可以成功地微调到目标语言。为了进一步探讨体重冻结的影响,我们使用冷冻训练进行了后续实验[2]。我们的结果与cnn在训练过程中“自下而上”收敛的观察结果一致,并证明了冻结训练的好处,特别是对于迁移学习。
{"title":"How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages?","authors":"J. Thompson, M. Schönwiesner, Yoshua Bengio, D. Willett","doi":"10.1109/ICASSP.2019.8683043","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683043","url":null,"abstract":"Characterization of the representations learned in intermediate layers of deep networks can provide valuable insight into the nature of a task and can guide the development of well-tailored learning strategies. Here we study convolutional neural network (CNN)-based acoustic models in the context of automatic speech recognition. Adapting a method proposed by [1], we measure the transferability of each layer between English, Dutch and German to assess their language-specificity. We observed three distinct regions of transferability: (1) the first two layers were entirely transferable between languages, (2) layers 2–8 were also highly transferable but we found some evidence of language specificity, (3) the subsequent fully connected layers were more language specific but could be successfully finetuned to the target language. To further probe the effect of weight freezing, we performed follow-up experiments using freeze-training [2]. Our results are consistent with the observation that CNNs converge ‘bottom up’ during training and demonstrate the benefit of freeze training, especially for transfer learning.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"56 5 1","pages":"2827-2831"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83391543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Generalized Boundary Detection Using Compression-based Analytics 基于压缩分析的广义边界检测
Christina L. Ting, R. Field, T. Quach, Travis L. Bauer
We present a new method for boundary detection within sequential data using compression-based analytics. Our approach is to approximate the information distance between two adjacent sliding windows within the sequence. Large values in the distance metric are indicative of boundary locations. A new algorithm is developed, referred to as sliding information distance (SLID), that provides a fast, accurate, and robust approximation to the normalized information distance. A modified smoothed z-score algorithm is used to locate peaks in the distance metric, indicating boundary locations. A variety of data sources are considered, including text and audio, to demonstrate the efficacy of our approach.
我们提出了一种基于压缩分析的序列数据边界检测新方法。我们的方法是近似序列中两个相邻滑动窗口之间的信息距离。距离度量中的大值表示边界位置。提出了一种新的算法,称为滑动信息距离(slide),它提供了一种快速、准确和鲁棒的归一化信息距离近似值。一种改进的平滑z-score算法用于定位距离度量中的峰值,指示边界位置。考虑了各种数据源,包括文本和音频,以证明我们的方法的有效性。
{"title":"Generalized Boundary Detection Using Compression-based Analytics","authors":"Christina L. Ting, R. Field, T. Quach, Travis L. Bauer","doi":"10.1109/ICASSP.2019.8682257","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682257","url":null,"abstract":"We present a new method for boundary detection within sequential data using compression-based analytics. Our approach is to approximate the information distance between two adjacent sliding windows within the sequence. Large values in the distance metric are indicative of boundary locations. A new algorithm is developed, referred to as sliding information distance (SLID), that provides a fast, accurate, and robust approximation to the normalized information distance. A modified smoothed z-score algorithm is used to locate peaks in the distance metric, indicating boundary locations. A variety of data sources are considered, including text and audio, to demonstrate the efficacy of our approach.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"3522-3526"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89339663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transfer Learning Using Raw Waveform Sincnet for Robust Speaker Diarization 基于原始波形自网的稳健说话人特征化迁移学习
Harishchandra Dubey, A. Sangwan, J. Hansen
Speaker diarization tells who spoke and whenƒ in an audio stream. SincNet is a recently developed novel convolutional neural network (CNN) architecture where the first layer consists of parameterized sinc filters. Unlike conventional CNNs, SincNet take raw speech waveform as input. This paper leverages SincNet in vanilla transfer learning (VTL) setup. Out-domain data is used for training SincNet-VTL to perform frame-level speaker classification. Trained SincNet-VTL is later utilized as feature extractor for in-domain data. We investigated pooling (max, avg) strategies for deriving utterance-level embedding using frame-level features extracted from trained network. These utterance/segment level embedding are adopted as speaker models during clustering stage in diarization pipeline. We compared the proposed SincNet-VTL embedding with baseline i-vector features. We evaluated our approaches on two corpora, CRSS-PLTL and AMI. Results show the efficacy of trained SincNet-VTL for speaker-discriminative embedding even when trained on small amount of data. Proposed features achieved relative DER improvements of 19.12% and 52.07% for CRSS-PLTL and AMI data, respectively over baseline i-vectors.
扬声器拨号告诉谁说话,何时在音频流。SincNet是最近开发的一种新型卷积神经网络(CNN)架构,其中第一层由参数化的sinc滤波器组成。与传统cnn不同,SincNet采用原始语音波形作为输入。本文在普通迁移学习(VTL)设置中利用了SincNet。域外数据用于训练SincNet-VTL进行帧级说话人分类。训练后的SincNet-VTL用作域内数据的特征提取器。我们研究了池化(max, avg)策略,利用从训练好的网络中提取的帧级特征来获得话语级嵌入。在分词管道的聚类阶段,采用这些话语/段级嵌入作为说话人模型。我们将所提出的SincNet-VTL嵌入与基线i向量特征进行了比较。我们在两个语料库上评估了我们的方法,CRSS-PLTL和AMI。结果表明,训练后的SincNet-VTL即使在少量数据上也能有效地进行说话人判别嵌入。与基线i向量相比,所提出的特征在CRSS-PLTL和AMI数据上的相对DER分别提高了19.12%和52.07%。
{"title":"Transfer Learning Using Raw Waveform Sincnet for Robust Speaker Diarization","authors":"Harishchandra Dubey, A. Sangwan, J. Hansen","doi":"10.1109/ICASSP.2019.8683023","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683023","url":null,"abstract":"Speaker diarization tells who spoke and whenƒ in an audio stream. SincNet is a recently developed novel convolutional neural network (CNN) architecture where the first layer consists of parameterized sinc filters. Unlike conventional CNNs, SincNet take raw speech waveform as input. This paper leverages SincNet in vanilla transfer learning (VTL) setup. Out-domain data is used for training SincNet-VTL to perform frame-level speaker classification. Trained SincNet-VTL is later utilized as feature extractor for in-domain data. We investigated pooling (max, avg) strategies for deriving utterance-level embedding using frame-level features extracted from trained network. These utterance/segment level embedding are adopted as speaker models during clustering stage in diarization pipeline. We compared the proposed SincNet-VTL embedding with baseline i-vector features. We evaluated our approaches on two corpora, CRSS-PLTL and AMI. Results show the efficacy of trained SincNet-VTL for speaker-discriminative embedding even when trained on small amount of data. Proposed features achieved relative DER improvements of 19.12% and 52.07% for CRSS-PLTL and AMI data, respectively over baseline i-vectors.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"6296-6300"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89391086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning 基于多摄像头数据融合和机器学习的精确车辆检测
Hao Wu, Xinxiang Zhang, B. Story, D. Rajan
Computer-vision methods have been extensively used in intelligent transportation systems for vehicle detection. However, the detection of severely occluded or partially observed vehicles due to the limited camera fields of view remains a challenge. This paper presents a multi-camera vehicle detection system that significantly improves the detection performance under occlusion conditions. The key elements of the proposed method include a novel multi-view region proposal network that localizes the candidate vehicles on the ground plane. We also infer the vehicle position on the ground plane by leveraging multi-view cross-camera context. Experiments are conducted on dataset captured from a roadway in Richardson, TX, USA, and the system attains 0.7849 Average Precision and 0.7089 Multi Object Detection Precision. The proposed system results in an approximately 31.2% increase in AP and 8.6% in MODP than the single-camera methods.
计算机视觉方法已广泛应用于智能交通系统中的车辆检测。然而,由于有限的相机视野,严重遮挡或部分观察到的车辆的检测仍然是一个挑战。本文提出了一种多摄像头车辆检测系统,该系统显著提高了遮挡条件下的检测性能。该方法的关键要素包括一种新的多视图区域建议网络,该网络可以在地平面上定位候选车辆。我们还通过利用多视图跨摄像机上下文推断车辆在地平面上的位置。在美国德克萨斯州Richardson的道路数据集上进行了实验,系统的平均精度为0.7849,多目标检测精度为0.7089。与单相机相比,该系统的AP和MODP分别提高了31.2%和8.6%。
{"title":"Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning","authors":"Hao Wu, Xinxiang Zhang, B. Story, D. Rajan","doi":"10.1109/ICASSP.2019.8683350","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683350","url":null,"abstract":"Computer-vision methods have been extensively used in intelligent transportation systems for vehicle detection. However, the detection of severely occluded or partially observed vehicles due to the limited camera fields of view remains a challenge. This paper presents a multi-camera vehicle detection system that significantly improves the detection performance under occlusion conditions. The key elements of the proposed method include a novel multi-view region proposal network that localizes the candidate vehicles on the ground plane. We also infer the vehicle position on the ground plane by leveraging multi-view cross-camera context. Experiments are conducted on dataset captured from a roadway in Richardson, TX, USA, and the system attains 0.7849 Average Precision and 0.7089 Multi Object Detection Precision. The proposed system results in an approximately 31.2% increase in AP and 8.6% in MODP than the single-camera methods.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"3767-3771"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90629206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Multi-channel Itakura Saito Distance Minimization with Deep Neural Network 基于深度神经网络的多通道Itakura Saito距离最小化
M. Togami
A multi-channel speech source separation with a deep neural network which optimizes not only the time-varying variance of a speech source but also the multi-channel spatial covariance matrix jointly without any iterative optimization method is shown. Instead of a loss function which does not evaluate spatial characteristics of the output signal, the proposed method utilizes a loss function based on minimization of multi-channel Itakura-Saito Distance (MISD), which evaluates spatial characteristics of the output signal. The cost function based on MISD is calculated by the estimated posterior probability density function (PDF) of each speech source based on a time-varying Gaussian distribution model. The loss function of the neural network and the PDF of each speech source that is assumed in multi-channel speech source separation are consistent with each other. As a neural-network architecture, the proposed method utilizes multiple bidirectional long-short term memory (BLSTM) layers. The BLSTM layers and the successive complex-valued signal processing are jointly optimized in the training phase. Experimental results show that more accurately separated speech signal can be obtained with neural network parameters optimized based on the proposed MISD minimization than that with neural network parameters optimized based on loss functions without spatial covariance matrix evaluation.
提出了一种基于深度神经网络的多通道语音源分离方法,该方法不仅对语音源的时变方差进行优化,而且对多通道空间协方差矩阵进行联合优化,无需任何迭代优化方法。该方法利用基于多通道Itakura-Saito距离最小化(MISD)的损失函数来评估输出信号的空间特征,而不是不评估输出信号的空间特征的损失函数。基于MISD的代价函数是根据时变高斯分布模型估计每个语音源的后验概率密度函数(PDF)。神经网络的损失函数与多通道语音源分离中假设的每个语音源的PDF是一致的。作为一种神经网络结构,该方法利用了多个双向长短期记忆(BLSTM)层。在训练阶段对BLSTM层和逐次复值信号处理进行联合优化。实验结果表明,与不进行空间协方差矩阵评估的基于损失函数的神经网络参数优化方法相比,基于MISD最小化的神经网络参数优化方法可以更准确地分离语音信号。
{"title":"Multi-channel Itakura Saito Distance Minimization with Deep Neural Network","authors":"M. Togami","doi":"10.1109/ICASSP.2019.8683410","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683410","url":null,"abstract":"A multi-channel speech source separation with a deep neural network which optimizes not only the time-varying variance of a speech source but also the multi-channel spatial covariance matrix jointly without any iterative optimization method is shown. Instead of a loss function which does not evaluate spatial characteristics of the output signal, the proposed method utilizes a loss function based on minimization of multi-channel Itakura-Saito Distance (MISD), which evaluates spatial characteristics of the output signal. The cost function based on MISD is calculated by the estimated posterior probability density function (PDF) of each speech source based on a time-varying Gaussian distribution model. The loss function of the neural network and the PDF of each speech source that is assumed in multi-channel speech source separation are consistent with each other. As a neural-network architecture, the proposed method utilizes multiple bidirectional long-short term memory (BLSTM) layers. The BLSTM layers and the successive complex-valued signal processing are jointly optimized in the training phase. Experimental results show that more accurately separated speech signal can be obtained with neural network parameters optimized based on the proposed MISD minimization than that with neural network parameters optimized based on loss functions without spatial covariance matrix evaluation.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"109 1","pages":"536-540"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80652311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deep Speaker Representation Using Orthogonal Decomposition and Recombination for Speaker Verification 基于正交分解和重组的深度说话人表示用于说话人验证
I. Kim, Kyu-hong Kim, Ji-Whan Kim, Changkyu Choi
Speech signal contains intrinsic and extrinsic variations such as accent, emotion, dialect, phoneme, speaking manner, noise, music, and reverberation. Some of these variations are unnecessary and are unspecified factors of variation. These factors lead to increased variability in speaker representation. In this paper, we assume that unspecified factors of variation exist in speaker representations, and we attempt to minimize variability in speaker representation. The key idea is that a primal speaker representation can be decomposed into orthogonal vectors and these vectors are recombined by using deep neural networks (DNN) to reduce speaker representation variability, yielding performance improvement for speaker verification (SV). The experimental results show that our proposed approach produces a relative equal error rate (EER) reduction of 47.1% compared to the use of the same convolutional neural network (CNN) architecture on the Vox-Celeb dataset. Furthermore, our proposed method provides significant improvement for short utterances.
语音信号包括口音、情绪、方言、音素、说话方式、噪音、音乐和混响等内在和外在的变化。其中一些变化是不必要的,是未指明的变化因素。这些因素导致说话人表现的变异性增加。在本文中,我们假设说话人表征中存在未指明的变异因素,并试图最小化说话人表征中的变异。其关键思想是将原始说话人表示分解为正交向量,并使用深度神经网络(DNN)对这些向量进行重组,以减少说话人表示的可变性,从而提高说话人验证(SV)的性能。实验结果表明,与在Vox-Celeb数据集上使用相同的卷积神经网络(CNN)架构相比,我们提出的方法产生的相对相等错误率(EER)降低了47.1%。此外,我们提出的方法对短话语有显著的改善。
{"title":"Deep Speaker Representation Using Orthogonal Decomposition and Recombination for Speaker Verification","authors":"I. Kim, Kyu-hong Kim, Ji-Whan Kim, Changkyu Choi","doi":"10.1109/ICASSP.2019.8683332","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683332","url":null,"abstract":"Speech signal contains intrinsic and extrinsic variations such as accent, emotion, dialect, phoneme, speaking manner, noise, music, and reverberation. Some of these variations are unnecessary and are unspecified factors of variation. These factors lead to increased variability in speaker representation. In this paper, we assume that unspecified factors of variation exist in speaker representations, and we attempt to minimize variability in speaker representation. The key idea is that a primal speaker representation can be decomposed into orthogonal vectors and these vectors are recombined by using deep neural networks (DNN) to reduce speaker representation variability, yielding performance improvement for speaker verification (SV). The experimental results show that our proposed approach produces a relative equal error rate (EER) reduction of 47.1% compared to the use of the same convolutional neural network (CNN) architecture on the Vox-Celeb dataset. Furthermore, our proposed method provides significant improvement for short utterances.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"6126-6130"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79555275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Improving Graph Trend Filtering with Non-convex Penalties 用非凸惩罚改进图趋势过滤
R. Varma, Harlin Lee, Yuejie Chi, J. Kovacevic
In this paper, we study the denoising of piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness over a graph. We extend the graph trend filtering framework to a family of nonconvex regularizers that exhibit superior recovery performance over existing convex ones. We present theoretical results in the form of asymptotic error rates for both generic and specialized graph models. We further present an ADMM-based algorithm to solve the proposed optimization problem and analyze its convergence. Numerical performance of the proposed framework with non-convex regularizers on both synthetic and real-world data are presented for denoising, support recovery, and semi-supervised classification.
在本文中,我们研究了在图上表现出非齐次平滑水平的分段光滑图信号的去噪问题。我们将图趋势过滤框架扩展到一组非凸正则器,这些非凸正则器比现有的凸正则器表现出更好的恢复性能。我们以渐近错误率的形式给出了一般图模型和专门图模型的理论结果。我们进一步提出了一种基于admm的算法来解决所提出的优化问题,并分析了其收敛性。采用非凸正则化器对合成数据和真实数据进行了去噪、支持恢复和半监督分类。
{"title":"Improving Graph Trend Filtering with Non-convex Penalties","authors":"R. Varma, Harlin Lee, Yuejie Chi, J. Kovacevic","doi":"10.1109/ICASSP.2019.8683279","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683279","url":null,"abstract":"In this paper, we study the denoising of piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness over a graph. We extend the graph trend filtering framework to a family of nonconvex regularizers that exhibit superior recovery performance over existing convex ones. We present theoretical results in the form of asymptotic error rates for both generic and specialized graph models. We further present an ADMM-based algorithm to solve the proposed optimization problem and analyze its convergence. Numerical performance of the proposed framework with non-convex regularizers on both synthetic and real-world data are presented for denoising, support recovery, and semi-supervised classification.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"5391-5395"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89840257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exact Discrete-time Realizations of the Gammatone Filter γ酮滤波器的精确离散时间实现
Elizabeth Ren, Hans-Andrea Loeliger
The paper derives an exact discrete-time state space realization of the popular gammatone filter. No such realization appears to be available in the literature. The proposed realization is computationally attractive: a gammatone filter with exponent N requires less than 6N multiplications and additions per sample. The integer coefficients of the realization can be computed by a simple recursion. The proposed realization also yields a closed-form expression for the frequency response. The proposed primary realization is not quite in a standard form, but it is easily transformed into another realization whose state transition matrix is in Jordan canonical form.
本文导出了一种精确的离散时间状态空间实现的流行伽玛酮滤波器。在文献中似乎没有这样的认识。提出的实现在计算上是有吸引力的:一个指数为N的伽玛酮滤波器每个样本需要少于6N次乘法和加法。实现的整数系数可以通过简单的递归计算得到。提出的实现还产生了频率响应的封闭形式表达式。所提出的主要实现并不完全是标准形式,但它很容易转换为另一种实现,其状态转移矩阵是乔丹规范形式。
{"title":"Exact Discrete-time Realizations of the Gammatone Filter","authors":"Elizabeth Ren, Hans-Andrea Loeliger","doi":"10.1109/ICASSP.2019.8683073","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683073","url":null,"abstract":"The paper derives an exact discrete-time state space realization of the popular gammatone filter. No such realization appears to be available in the literature. The proposed realization is computationally attractive: a gammatone filter with exponent N requires less than 6N multiplications and additions per sample. The integer coefficients of the realization can be computed by a simple recursion. The proposed realization also yields a closed-form expression for the frequency response. The proposed primary realization is not quite in a standard form, but it is easily transformed into another realization whose state transition matrix is in Jordan canonical form.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"316-320"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89967426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaboration between Bordeaux-inp and Utp, from Research to Education, in the Field of Signal Processing 波尔多inp和Utp在信号处理领域的合作,从研究到教育
Fernando Merchan, Héctor Poveda, É. Grivel
The purpose of this paper is to share our positive experience about the collaboration launched a few years ago between UTP (Panama) and Bordeaux INP (France) in the field of signal processing. This collaboration involves research and education activities. This has led to numerous internships of French students in Panama, mobilities of researchers, common research papers, and the 1st double diploma signed between France and a country of Central America. Thus, this paper presents the various aspects of the collaboration.
本文的目的是分享我们几年前在信号处理领域启动的UTP(巴拿马)和波尔多INP(法国)之间的合作的积极经验。这种合作包括研究和教育活动。这使得许多法国学生在巴拿马实习,研究人员流动,共同的研究论文,以及法国与中美洲国家签署的第一个双文凭。因此,本文介绍了合作的各个方面。
{"title":"Collaboration between Bordeaux-inp and Utp, from Research to Education, in the Field of Signal Processing","authors":"Fernando Merchan, Héctor Poveda, É. Grivel","doi":"10.1109/ICASSP.2019.8683079","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683079","url":null,"abstract":"The purpose of this paper is to share our positive experience about the collaboration launched a few years ago between UTP (Panama) and Bordeaux INP (France) in the field of signal processing. This collaboration involves research and education activities. This has led to numerous internships of French students in Panama, mobilities of researchers, common research papers, and the 1st double diploma signed between France and a country of Central America. Thus, this paper presents the various aspects of the collaboration.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"7645-7649"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75754165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1