ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones 学习新课程不忘原课程的网络适应策略

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682848

Hagai Taitelbaum, Gal Chechik, J. Goldberger

We address the problem of adding new classes to an existing classifier without hurting the original classes, when no access is allowed to any sample from the original classes. This problem arises frequently since models are often shared without their training data, due to privacy and data ownership concerns. We propose an easy-to-use approach that modifies the original classifier by retraining a suitable subset of layers using a linearly-tuned, knowledge-distillation regularization. The set of layers that is tuned depends on the number of new added classes and the number of original classes.We evaluate the proposed method on two standard datasets, first in a language-identification task, then in an image classification setup. In both cases, the method achieves classification accuracy that is almost as good as that obtained by a system trained using unrestricted samples from both the original and new classes.

当不允许访问原始类的任何样本时，我们解决了在不损害原始类的情况下向现有分类器添加新类的问题。这个问题经常出现，因为由于隐私和数据所有权问题，模型通常在没有训练数据的情况下共享。我们提出了一种易于使用的方法，通过使用线性调整的知识蒸馏正则化重新训练合适的层子集来修改原始分类器。调优的层集取决于新添加类的数量和原始类的数量。我们在两个标准数据集上评估了所提出的方法，首先是在语言识别任务中，然后是在图像分类设置中。在这两种情况下，该方法获得的分类精度几乎与使用来自原始和新类的无限制样本训练的系统获得的分类精度一样好。

引用次数: 2

Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters 引入正交周期序列用于函数链多项式滤波器的辨识

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683342

A. Carini, S. Orcioni, S. Cecchi

The paper introduces a novel family of deterministic signals, the orthogonal periodic sequences (OPSs), for the identification of functional link polynomial (FLiP) filters. The novel sequences share many of the characteristics of the perfect periodic sequences (PPSs). As the PPSs, they allow the perfect identification of a FLiP filter on a finite time interval with the cross-correlation method. In contrast to the PPSs, OPSs can identify also non-orthogonal FLiP filters, as the Volterra filters. With OPSs, the input sequence can have any persistently exciting distribution and can also be a quantized sequence. OPSs can often identify FLiP filters with a sequence period and a computational complexity much smaller than that of PPSs. Several results are reported to show the effectiveness of the proposed sequences identifying a real nonlinear audio system.

本文引入了一种新的确定性信号族——正交周期序列(OPSs)，用于函数链多项式滤波器的辨识。新序列具有完美周期序列(PPSs)的许多特征。作为pps，它们允许在有限时间间隔内用互相关方法完美地识别翻转滤波器。与pps相比，ops还可以识别非正交FLiP滤波器，如Volterra滤波器。使用ops，输入序列可以具有任何持久的激励分布，也可以是量化序列。ops通常可以识别具有序列周期和计算复杂度的FLiP滤波器。实验结果表明了所提序列识别实际非线性音频系统的有效性。

引用次数: 3

Performance Analysis of Convex Data Detection in MIMO MIMO中凸数据检测性能分析

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683890

Ehsan Abbasi, Fariborz Salehi, B. Hassibi

We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.

研究了大型多输入多输出(MIMO)系统中凸数据检测方法的性能。目标是恢复一个n维复信号，其条目来自任意星座$mathcal{D} 子集mathbb{C}$，使用m噪声线性测量。由于最大似然(ML)估计涉及最小化离散集${mathcal{D}^n}$上的损失函数，因此对于较大的n来说，它在计算上变得难以处理。一种方法是松弛到$mathcal{D}$凸集，并利用凸编程精确地解决问题，然后将答案映射到集合$mathcal{D}$中最近的点。我们假设一个i.i.d的复高斯信道矩阵，并推导出该凸方法在m, n→∞极限下的符号误差概率表达式。以前的工作只能对BPSK和PAM等真正有价值的星座进行这样的研究。本文的主要贡献是将结果推广到复值星座。特别地，我们使用我们的主要定理来计算PSK和QAM星座的复杂算法的性能。此外，我们引入了高信噪比下符号误差概率的封闭公式，并确定了一致信号恢复所需的最小测量次数m。

{"title":"Performance Analysis of Convex Data Detection in MIMO","authors":"Ehsan Abbasi, Fariborz Salehi, B. Hassibi","doi":"10.1109/ICASSP.2019.8683890","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683890","url":null,"abstract":"We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 1","pages":"4554-4558"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90291360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Automatic Transcription of Diatonic Harmonica Recordings 自动转录的全音阶口琴录音

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682334

Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm

This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.

本文提出了一种自动抄写全音阶口琴的方法。它通过谱图分解框架估计多音高激活。该框架基于概率潜在成分分析(PLCA)，并使用固定的四维字典，其中包含从口琴乐器音色中提取的频谱模板。在存在谐波重叠或相当大的音色变化时，基于谱图分解的方法可能存在局部最优问题。为了缓解这个问题，我们提出了一套谐波约束，这些约束是口琴乐器音符布局固有的或由特定的全音阶口琴演奏技术引起的。这些约束有助于指导分解过程，直到收敛到有意义的多音高激活。这项工作还建立了一个新的音频数据集，其中包含全音阶口琴摘录的独奏录音和相应的多音高注释。我们将我们提出的方法与该数据集上用于自动音乐转录的多种基线技术进行比较，并基于基于帧的f测量统计报告结果。

{"title":"Automatic Transcription of Diatonic Harmonica Recordings","authors":"Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm","doi":"10.1109/ICASSP.2019.8682334","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682334","url":null,"abstract":"This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"256-260"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90299941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Imitation Refinement for X-ray Diffraction Signal Processing x射线衍射信号处理的模拟改进

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683723

Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes

Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.

许多现实世界的任务涉及从满足背景或先验知识的数据中识别信号。在材料发现等领域，由于原始实验数据的缺陷和偏差，x射线衍射(XRD)信号的识别往往需要大量的(人工)专家工作来找到与理想理论信号相似的精炼信号。因此，利用模拟理论数据自动精炼原始XRD信号是可取的。我们提出了一种新的方法来改进不完美的输入信号，通过一个预训练的分类器结合来自模拟理论数据的先验知识，使改进后的信号模仿理想信号。分类器在理想的模拟数据上进行训练，对信号进行分类，并学习一个嵌入空间，其中每个类由一个原型表示。细化器通过小的修改来学习细化不完美的信号，使它们的嵌入更接近相应的原型。我们证明了精炼厂可以用监督和非监督两种方式进行训练。我们进一步说明了所提出的方法在材料发现中的x射线衍射信号细化任务中的定性和定量有效性。

{"title":"Imitation Refinement for X-ray Diffraction Signal Processing","authors":"Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes","doi":"10.1109/ICASSP.2019.8683723","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683723","url":null,"abstract":"Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 1","pages":"3337-3341"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90311979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives 基于层次注意网络的语境建模在口语叙事中的情绪和自我评估情绪检测

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683801

Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller

Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.

通过词汇使用自动检测个人叙述中的情绪和影响有可能帮助自动检测心理治疗中的变化。例如，这样的工具可以提供一个有效的、客观的衡量一个人处于积极或消极心态的时间。为了实现这一目标，我们提出并开发了一个分层注意模型，用于个人叙事文本中的情绪(积极和消极)和自我评估情感检测任务。我们还对我们的情感分析模型学习到的单词关注进行了定性分析。在一个关键的结果中，我们的注意力模型在Ulm语音状态(USoMS)语料库的测试分区上实现了91.0%的非加权平均召回率(UAR)。唤醒和效价检测3类任务的uar分别为73.7%和68.6%。最后，我们的定性分析将口语强化与积极情绪联系起来，而不确定的措辞与消极情绪联系起来。

{"title":"Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives","authors":"Lukas Stappen, N. Cummins, Eva-Maria Messner, H. Baumeister, J. Dineley, Björn Schuller","doi":"10.1109/ICASSP.2019.8683801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683801","url":null,"abstract":"Automatic detection of sentiment and affect in personal narratives through word usage has the potential to assist in the automated detection of change in psychotherapy. Such a tool could, for instance, provide an efficient, objective measure of the time a person has been in a positive or negative state-of-mind. Towards this goal, we propose and develop a hierarchical attention model for the tasks of sentiment (positive and negative) and self-assessed affect detection in transcripts of personal narratives. We also perform a qualitative analysis of the word attentions learnt by our sentiment analysis model. In a key result, our attention model achieved an un-weighted average recall (UAR) of 91.0 % in a binary sentiment detection task on the test partition of the Ulm State-of-Mind in Speech (USoMS) corpus. We also achieved UARs of 73.7 % and 68.6 % in the 3-class tasks of arousal and valence detection respectively. Finally, our qualitative analysis associates colloquial reinforcements with positive sentiments, and uncertain phrasing with negative sentiments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 14","pages":"6680-6684"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91435895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder 多声道变分自编码器混响混响的联合分离与去噪

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8683497

S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino

In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.

本文研究了高混响条件下的多通道源分离问题。多通道变分自编码器(MVAE)是近年来提出的一种信号源分离方法，它利用条件变分自编码器(CVAE)的解码器分布作为源信号复杂谱图的生成模型。尽管与传统方法相比，MVAE可以显著提高源分离性能，但由于MVAE使用的是瞬时混合模型，因此其分离高度不稳定混合物的能力仍然有限。为了克服这一限制，本文提出扩展MVAE，通过将分离系统表述为频域卷积混合模型来同时解决源分离和去噪问题。提出了一种基于坐标下降法的收敛保证优化算法。实验结果表明，在高混响环境下，该方法在所有声源分离指标上都优于传统方法。

{"title":"Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder","authors":"S. Inoue, H. Kameoka, Li Li, Shogo Seki, S. Makino","doi":"10.1109/ICASSP.2019.8683497","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683497","url":null,"abstract":"In this paper, we deal with a multichannel source separation problem under a highly reverberant condition. The multichan- nel variational autoencoder (MVAE) is a recently proposed source separation method that employs the decoder distribu- tion of a conditional VAE (CVAE) as the generative model for the complex spectrograms of the underlying source sig- nals. Although MVAE is notable in that it can significantly improve the source separation performance compared with conventional methods, its capability to separate highly rever- berant mixtures is still limited since MVAE uses an instan- taneous mixture model. To overcome this limitation, in this paper we propose extending MVAE to simultaneously solve source separation and dereverberation problems by formulat- ing the separation system as a frequency-domain convolutive mixture model. A convergence-guaranteed algorithm based on the coordinate descent method is derived for the optimiza- tion. Experimental results revealed that the proposed method outperformed the conventional methods in terms of all the source separation criteria in highly reverberant environments.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84952009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings 看、听和学习更多:深度音频嵌入的设计选择

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682475

J. Cramer, Ho-Hsiang Wu, J. Salamon, J. Bello

A considerable challenge in applying deep learning to audio classification is the scarcity of labeled data. An increasingly popular solution is to learn deep audio embeddings from large audio collections and use them to train shallow classifiers using small labeled datasets. Look, Listen, and Learn (L3-Net) is an embedding trained through self-supervised learning of audio-visual correspondence in videos as opposed to other embeddings requiring labeled data. This framework has the potential to produce powerful out-of-the-box embeddings for downstream audio classification tasks, but has a number of unexplained design choices that may impact the embeddings’ behavior. In this paper we investigate how L3-Net design choices impact the performance of downstream audio classifiers trained with these embeddings. We show that audio-informed choices of input representation are important, and that using sufficient data for training the embedding is key. Surprisingly, we find that matching the content for training the embedding to the downstream task is not beneficial. Finally, we show that our best variant of the L3-Net embedding outperforms both the VGGish and SoundNet embeddings, while having fewer parameters and being trained on less data. Our implementation of the L3-Net embedding model as well as pre-trained models are made freely available online.

将深度学习应用于音频分类的一个相当大的挑战是标记数据的稀缺性。一个日益流行的解决方案是从大型音频集合中学习深度音频嵌入，并使用它们来训练使用小标记数据集的浅分类器。看、听、学(L3-Net)是一种通过视频中视听对应的自监督学习训练的嵌入，而不是其他需要标记数据的嵌入。该框架有潜力为下游音频分类任务生成强大的开箱即用嵌入，但有许多无法解释的设计选择可能会影响嵌入的行为。在本文中，我们研究了L3-Net设计选择如何影响使用这些嵌入训练的下游音频分类器的性能。我们表明，输入表示的音频信息选择是重要的，并且使用足够的数据来训练嵌入是关键。令人惊讶的是，我们发现将训练嵌入的内容与下游任务匹配是无益的。最后，我们证明了L3-Net嵌入的最佳变体优于VGGish和SoundNet嵌入，同时具有更少的参数和更少的数据进行训练。我们的L3-Net嵌入模型的实现以及预训练模型在网上免费提供。

{"title":"Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings","authors":"J. Cramer, Ho-Hsiang Wu, J. Salamon, J. Bello","doi":"10.1109/ICASSP.2019.8682475","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682475","url":null,"abstract":"A considerable challenge in applying deep learning to audio classification is the scarcity of labeled data. An increasingly popular solution is to learn deep audio embeddings from large audio collections and use them to train shallow classifiers using small labeled datasets. Look, Listen, and Learn (L3-Net) is an embedding trained through self-supervised learning of audio-visual correspondence in videos as opposed to other embeddings requiring labeled data. This framework has the potential to produce powerful out-of-the-box embeddings for downstream audio classification tasks, but has a number of unexplained design choices that may impact the embeddings’ behavior. In this paper we investigate how L3-Net design choices impact the performance of downstream audio classifiers trained with these embeddings. We show that audio-informed choices of input representation are important, and that using sufficient data for training the embedding is key. Surprisingly, we find that matching the content for training the embedding to the downstream task is not beneficial. Finally, we show that our best variant of the L3-Net embedding outperforms both the VGGish and SoundNet embeddings, while having fewer parameters and being trained on less data. Our implementation of the L3-Net embedding model as well as pre-trained models are made freely available online.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"3852-3856"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85310875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 223

Solving Quadratic Equations via Amplitude-based Nonconvex Optimization 基于幅值的非凸优化求解二次方程

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682357

Vincent Monardo, Yuanxin Li, Yuejie Chi

In many signal processing tasks, one seeks to recover an r-column matrix object ${mathbf{X}} in {mathbb{C}^{n times r}}$ from a set of nonnegative quadratic measurements up to orthonormal transforms. Example applications include coherence retrieval in optical imaging and covariance sketching for high-dimensional streaming data. To this end, efficient nonconvex optimization methods are quite appealing, due to their computational efficiency and scalability to large-scale problems. There is a recent surge of activities in designing nonconvex methods for the special case r = 1, known as phase retrieval; however, very little work has studied the general rank-r setting. Motivated by the success of phase retrieval, in this paper we derive several algorithms which utilize the quadratic loss function based on amplitude measurements, including (stochastic) gradient descent and alternating minimization. Numerical experiments demonstrate their computational and statistical performances, highlighting the superior performance of stochastic gradient descent with appropriate mini-batch sizes.

在许多信号处理任务中，人们试图从一组非负二次测量到标准正交变换中恢复r列矩阵对象${mathbf{X}} In {mathbb{C}^{n 乘以r}}$。示例应用包括光学成像中的相干检索和高维流数据的协方差草图。为此，高效的非凸优化方法非常有吸引力，因为它们具有计算效率和大规模问题的可扩展性。最近在设计特殊情况r = 1的非凸方法方面的活动激增，称为相位检索;然而，研究一般的rank-r设置的工作很少。在相位恢复成功的激励下，本文推导了几种利用基于幅度测量的二次损失函数的算法，包括(随机)梯度下降和交替最小化。数值实验证明了它们的计算和统计性能，突出了在适当的小批大小下随机梯度下降的优越性能。

{"title":"Solving Quadratic Equations via Amplitude-based Nonconvex Optimization","authors":"Vincent Monardo, Yuanxin Li, Yuejie Chi","doi":"10.1109/ICASSP.2019.8682357","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682357","url":null,"abstract":"In many signal processing tasks, one seeks to recover an r-column matrix object ${mathbf{X}} in {mathbb{C}^{n times r}}$ from a set of nonnegative quadratic measurements up to orthonormal transforms. Example applications include coherence retrieval in optical imaging and covariance sketching for high-dimensional streaming data. To this end, efficient nonconvex optimization methods are quite appealing, due to their computational efficiency and scalability to large-scale problems. There is a recent surge of activities in designing nonconvex methods for the special case r = 1, known as phase retrieval; however, very little work has studied the general rank-r setting. Motivated by the success of phase retrieval, in this paper we derive several algorithms which utilize the quadratic loss function based on amplitude measurements, including (stochastic) gradient descent and alternating minimization. Numerical experiments demonstrate their computational and statistical performances, highlighting the superior performance of stochastic gradient descent with appropriate mini-batch sizes.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"57 11","pages":"5526-5530"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91399390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Nonlinear Acceleration of Constrained Optimization Algorithms 约束优化算法的非线性加速

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2019-05-12 DOI: 10.1109/ICASSP.2019.8682962

Vien V. Mai, M. Johansson

This paper introduces a novel technique for nonlinear acceleration of first-order methods for constrained convex optimization. Previous studies of nonlinear acceleration have only been able to provide convergence guarantees for unconstrained convex optimization. In contrast, our method is able to avoid infeasibility of the accelerated iterates and retains the theoretical performance guarantees of the unconstrained case. We focus on Anderson acceleration of the classical projected gradient descent (PGD) method, but our techniques can easily be extended to more sophisticated algorithms, such as mirror descent. Due to the presence of a constraint set, the relevant fixed-point mapping for PGD is not differentiable. However, we show that the convergence results for Anderson acceleration of smooth fixed-point iterations can be extended to the non-smooth case under certain technical conditions.

本文介绍了一阶约束凸优化方法的非线性加速新技术。以往的非线性加速度研究只能为无约束凸优化提供收敛性保证。相比之下，我们的方法能够避免加速迭代的不可行性，并保留了无约束情况下的理论性能保证。我们专注于经典的投影梯度下降(PGD)方法的安德森加速，但我们的技术可以很容易地扩展到更复杂的算法，如镜像下降。由于约束集的存在，PGD的不动点映射是不可微的。然而，我们证明了光滑不动点迭代的Anderson加速的收敛性结果在一定的技术条件下可以推广到非光滑情况。

引用次数: 9

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀