首页 > 最新文献

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications 模拟困难语音在临床语音应用中的训练数据增强
Yishan Jiao, Ming Tu, Visar Berisha, J. Liss
Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications typically rely on small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by rv 10% after data augmentation.
训练用于语音应用的机器学习算法需要大量标记的训练数据集。这对于临床应用来说是有问题的,因为出于隐私考虑或缺乏访问权限,获取此类数据的成本过高。因此,临床语音应用通常依赖于只有几十个说话者的小数据集。在本文中,我们提出了一种模拟临床应用的训练数据的方法,通过对抗性训练将健康语言转化为困难语言。我们使用客观和主观标准来评估我们的方法的有效性。我们将转化后的样本呈现给五位经验丰富的语言病理学家(slp),并要求他们识别健康或困难的样本。结果显示,slp在65%的时间里将转换后的言语识别为诵读困难。在一个先导分类实验中,我们证明了通过使用模拟语音样本来平衡现有的数据集,在数据增强后,分类准确率提高了rv 10%。
{"title":"Simulating Dysarthric Speech for Training Data Augmentation in Clinical Speech Applications","authors":"Yishan Jiao, Ming Tu, Visar Berisha, J. Liss","doi":"10.1109/ICASSP.2018.8462290","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462290","url":null,"abstract":"Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications typically rely on small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by rv 10% after data augmentation.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"253 1","pages":"6009-6013"},"PeriodicalIF":0.0,"publicationDate":"2018-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77530202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
On the Comparison of Two Room Compensation / Dereverberation Methods Employing Active Acoustic Boundary Absorption 两种采用主动声边界吸收的房间补偿/去噪方法的比较
Jacob Donley, C. Ritz, W. Kleijn
In this paper, we compare the performance of two active dereverberation techniques using a planar array of microphones and loudspeakers. The two techniques are based on a solution to the Kirchhoff-Helmholtz Integral Equation (KHIE). We adapt a Wave Field Synthesis (WFS) based method to the application of real-time 3D dereverberation by using a low-latency pre-filter design. The use of First-Order Differential (FOD) models is also proposed as an alternative method to the use of monopoles with WFS and which does not assume knowledge of the room geometry or primary sources. The two methods are compared by observing the suppression of reflections off a single active wall over the volume of a room in the time and (temporal) frequency domain. The FOD method provides better suppression of reflections than the WFS based method but at the expense of using higher order models. The equivalent absorption coefficients are comparable to passive fibre panel absorbers.
在本文中,我们比较了两种使用平面阵列麦克风和扬声器的主动去噪技术的性能。这两种技术都是基于Kirchhoff-Helmholtz积分方程(KHIE)的一种解。我们采用低延迟的预滤波器设计,将基于波场合成(WFS)的方法应用于实时三维去噪。一阶微分(FOD)模型的使用也被提议作为使用单极子与WFS的替代方法,该方法不需要假设房间几何形状或主要来源的知识。通过在时间和(时间)频域中观察单个有源墙对房间体积反射的抑制来比较这两种方法。FOD方法比基于WFS的方法提供更好的反射抑制,但代价是使用高阶模型。等效吸收系数与被动式纤维板吸收器相当。
{"title":"On the Comparison of Two Room Compensation / Dereverberation Methods Employing Active Acoustic Boundary Absorption","authors":"Jacob Donley, C. Ritz, W. Kleijn","doi":"10.1109/ICASSP.2018.8462318","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462318","url":null,"abstract":"In this paper, we compare the performance of two active dereverberation techniques using a planar array of microphones and loudspeakers. The two techniques are based on a solution to the Kirchhoff-Helmholtz Integral Equation (KHIE). We adapt a Wave Field Synthesis (WFS) based method to the application of real-time 3D dereverberation by using a low-latency pre-filter design. The use of First-Order Differential (FOD) models is also proposed as an alternative method to the use of monopoles with WFS and which does not assume knowledge of the room geometry or primary sources. The two methods are compared by observing the suppression of reflections off a single active wall over the volume of a room in the time and (temporal) frequency domain. The FOD method provides better suppression of reflections than the WFS based method but at the expense of using higher order models. The equivalent absorption coefficients are comparable to passive fibre panel absorbers.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"634 1","pages":"221-225"},"PeriodicalIF":0.0,"publicationDate":"2018-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77082801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Predicting Image Memorability 用于预测图像记忆的深度学习
Hammad Squalli-Houssaini, Ngoc Q. K. Duong, Gwenaelle Marquant, C. Demarty
Memorability of media content such as images and videos has recently become an important research subject in computer vision. This paper presents our computation model for predicting image memorability, which is based on a deep learning architecture designed for a classification task. We exploit the use of both convolutional neural network (CNN) - based visual features and semantic features related to image captioning for the task. We train and test our model on the large-scale benchmarking memorability dataset: LaMem. Experiment result shows that the proposed computational model obtains better prediction performance than the state of the art, and even outperforms human consistency. We further investigate the genericity of our model on other memorability datasets. Finally, by validating the model on interestingness datasets, we reconfirm the uncorrelation between memorability and interestingness of images.
近年来,图像、视频等媒体内容的可记忆性已成为计算机视觉领域的一个重要研究课题。本文提出了一种基于深度学习架构的图像记忆预测模型,该模型是为分类任务设计的。我们利用基于卷积神经网络(CNN)的视觉特征和与图像字幕相关的语义特征来完成任务。我们在大规模基准记忆性数据集LaMem上训练和测试我们的模型。实验结果表明,该计算模型的预测性能优于目前的预测水平,甚至优于人类的一致性。我们进一步研究了我们的模型在其他记忆数据集上的通用性。最后,通过在兴趣度数据集上验证模型,我们再次确认了图像的记忆性和兴趣度之间的不相关关系。
{"title":"Deep Learning for Predicting Image Memorability","authors":"Hammad Squalli-Houssaini, Ngoc Q. K. Duong, Gwenaelle Marquant, C. Demarty","doi":"10.1109/ICASSP.2018.8462292","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462292","url":null,"abstract":"Memorability of media content such as images and videos has recently become an important research subject in computer vision. This paper presents our computation model for predicting image memorability, which is based on a deep learning architecture designed for a classification task. We exploit the use of both convolutional neural network (CNN) - based visual features and semantic features related to image captioning for the task. We train and test our model on the large-scale benchmarking memorability dataset: LaMem. Experiment result shows that the proposed computational model obtains better prediction performance than the state of the art, and even outperforms human consistency. We further investigate the genericity of our model on other memorability datasets. Finally, by validating the model on interestingness datasets, we reconfirm the uncorrelation between memorability and interestingness of images.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"118 1","pages":"2371-2375"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88038823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Cascade: Channel-Aware Structured Cosparse Audio Declipper 级联:通道感知的结构化稀疏音频衰减器
Clément Gaultier, N. Bertin, R. Gribonval
This work features a new algorithm, CASCADE, which leverages a structured cosparse prior across channels to address the multichannel audio declipping problem. CASCADE technique outperforms the state-of-the-art method A-SPADE applied on each channel separately in all tested settings, while retaining similar runtime.
这项工作的特点是一种新的算法,CASCADE,它利用跨通道的结构化共稀疏先验来解决多通道音频衰减问题。级联技术优于最先进的A-SPADE方法,在所有测试设置中分别应用于每个通道,同时保持相似的运行时间。
{"title":"Cascade: Channel-Aware Structured Cosparse Audio Declipper","authors":"Clément Gaultier, N. Bertin, R. Gribonval","doi":"10.1109/ICASSP.2018.8461694","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461694","url":null,"abstract":"This work features a new algorithm, CASCADE, which leverages a structured cosparse prior across channels to address the multichannel audio declipping problem. CASCADE technique outperforms the state-of-the-art method A-SPADE applied on each channel separately in all tested settings, while retaining similar runtime.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"571-575"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91157816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiple-Input Neural Network-Based Residual Echo Suppression 基于多输入神经网络的残差回波抑制
Guillaume Carbajal, R. Serizel, E. Vincent, E. Humbert
A residual echo suppressor (RES) aims to suppress the residual echo in the output of an acoustic echo canceler (AEC). Spectral-based RES approaches typically estimate the magnitude spectra of the near-end speech and the residual echo from a single input, that is either the far-end speech or the echo computed by the AEC, and derive the RES filter coefficients accordingly. These single inputs do not always suffice to discriminate the near-end speech from the remaining echo. In this paper, we propose a neural network-based approach that directly estimates the RES filter coefficients from multiple inputs, including the AEC output, the far-end speech, and/or the echo computed by the AEC. We evaluate our system on real recordings of acoustic echo and near-end speech acquired in various situations with a smart speaker. We compare it to two single-input spectral-based approaches in terms of echo reduction and near-end speech distortion.
残余回波抑制器(RES)的目的是抑制声回波消除器(AEC)输出中的残余回波。基于频谱的RES方法通常估计近端语音和单输入残差回波的幅度谱,即远端语音或AEC计算的回波,并据此推导RES滤波器系数。这些单一输入并不总是足以区分近端语音和剩余的回声。在本文中,我们提出了一种基于神经网络的方法,该方法直接从多个输入中估计RES滤波器系数,包括AEC输出、远端语音和/或AEC计算的回波。我们用智能扬声器在各种情况下获得的声学回声和近端语音的真实录音来评估我们的系统。我们将其与两种基于单输入频谱的方法在回波减少和近端语音失真方面进行了比较。
{"title":"Multiple-Input Neural Network-Based Residual Echo Suppression","authors":"Guillaume Carbajal, R. Serizel, E. Vincent, E. Humbert","doi":"10.1109/ICASSP.2018.8461476","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461476","url":null,"abstract":"A residual echo suppressor (RES) aims to suppress the residual echo in the output of an acoustic echo canceler (AEC). Spectral-based RES approaches typically estimate the magnitude spectra of the near-end speech and the residual echo from a single input, that is either the far-end speech or the echo computed by the AEC, and derive the RES filter coefficients accordingly. These single inputs do not always suffice to discriminate the near-end speech from the remaining echo. In this paper, we propose a neural network-based approach that directly estimates the RES filter coefficients from multiple inputs, including the AEC output, the far-end speech, and/or the echo computed by the AEC. We evaluate our system on real recordings of acoustic echo and near-end speech acquired in various situations with a smart speaker. We compare it to two single-input spectral-based approaches in terms of echo reduction and near-end speech distortion.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"68 1","pages":"231-235"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90500564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition 基于域对抗训练的无监督域自适应说话人识别
Qing Wang, Wei Rao, Sining Sun, Lei Xie, Chng Eng Siong, Haizhou Li
The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in realworld applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training. Specifically, with domain adversarial training method, we use a gradient reversal layer to remove the domain variation and project the different domain data into the same subspace. Moreover, we compare the proposed method with other state-of-the-art unsupervised domain adaptation techniques for i-vector approach to speaker recognition (e.g. autoencoder based domain adaptation, inter dataset variability compensation, dataset-invariant covariance normalization, and so on). Experiments on 2013 domain adaptation challenge (DAC) dataset demonstrate that the proposed method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.
当评价数据集的域与训练数据集的域相似时,i向量方法在说话人识别方面取得了很好的效果。然而,在现实应用中,训练数据集和评估数据集之间总是存在不匹配,从而导致性能下降。为了解决这一问题,本文提出了通过领域对抗训练来学习领域不变和说话人区分的语音表示。具体来说,在域对抗训练方法中,我们使用梯度反转层去除域变化,并将不同的域数据投影到同一子空间中。此外,我们将所提出的方法与其他用于i向量方法的无监督域自适应技术进行了比较(例如基于自编码器的域自适应、数据集间可变性补偿、数据集不变协方差归一化等)。在2013年领域自适应挑战(DAC)数据集上的实验表明,该方法不仅能有效地解决数据集不匹配问题,而且优于无监督领域自适应方法。
{"title":"Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition","authors":"Qing Wang, Wei Rao, Sining Sun, Lei Xie, Chng Eng Siong, Haizhou Li","doi":"10.1109/ICASSP.2018.8461423","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461423","url":null,"abstract":"The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in realworld applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training. Specifically, with domain adversarial training method, we use a gradient reversal layer to remove the domain variation and project the different domain data into the same subspace. Moreover, we compare the proposed method with other state-of-the-art unsupervised domain adaptation techniques for i-vector approach to speaker recognition (e.g. autoencoder based domain adaptation, inter dataset variability compensation, dataset-invariant covariance normalization, and so on). Experiments on 2013 domain adaptation challenge (DAC) dataset demonstrate that the proposed method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1993 1","pages":"4889-4893"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82398805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Discriminative Clustering with Cardinality Constraints 基于基数约束的判别聚类
Anh T. Pham, R. Raich, Xiaoli Z. Fern
Clustering is widely used for exploratory data analysis in a variety of applications. Traditionally clustering is studied as an unsupervised task where no human inputs are provided. A recent trend in clustering is to leverage user provided side information to better infer the clustering structure in data. In this paper, we propose a probabilistic graphical model that allows user to provide as input the desired cluster sizes, namely the cardinality constraints. Our model also incorporates a flexible mechanism to inject control of the crispness of the clusters. Experiments on synthetic and real data demonstrate the effectiveness of the proposed method in learning with cardinality constraints in comparison with the current state-of-the-art.
聚类在各种应用中广泛用于探索性数据分析。传统的聚类研究是作为一个无监督的任务,其中不提供人工输入。聚类的最新趋势是利用用户提供的侧信息来更好地推断数据中的聚类结构。在本文中,我们提出了一个概率图形模型,允许用户提供所需的簇大小作为输入,即基数约束。我们的模型还采用了一种灵活的机制来注入对集群脆度的控制。在合成数据和真实数据上的实验表明,与现有的方法相比,该方法在基数约束学习方面是有效的。
{"title":"Discriminative Clustering with Cardinality Constraints","authors":"Anh T. Pham, R. Raich, Xiaoli Z. Fern","doi":"10.1109/ICASSP.2018.8461842","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461842","url":null,"abstract":"Clustering is widely used for exploratory data analysis in a variety of applications. Traditionally clustering is studied as an unsupervised task where no human inputs are provided. A recent trend in clustering is to leverage user provided side information to better infer the clustering structure in data. In this paper, we propose a probabilistic graphical model that allows user to provide as input the desired cluster sizes, namely the cardinality constraints. Our model also incorporates a flexible mechanism to inject control of the crispness of the clusters. Experiments on synthetic and real data demonstrate the effectiveness of the proposed method in learning with cardinality constraints in comparison with the current state-of-the-art.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"232 1","pages":"2291-2295"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75893675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Regressing Kernel Dictionary Learning 回归核字典学习
Kriti Kumar, A. Majumdar, M. G. Chandra, A. A. Kumar
In this paper, we present a kernelized dictionary learning framework for carrying out regression to model signals having a complex nonlinear nature. A joint optimization is carried out where the regression weights are learnt together with the dictionary and coefficients. Relevant formulation and dictionary building steps are provided. To demonstrate the effectiveness of the proposed technique, elaborate experimental results using different real-life datasets are presented. The results show that non-linear dictionary is more accurate for data modeling and provides significant improvement in estimation accuracy over the other popular traditional techniques especially when the data is highly non-linear.
在本文中,我们提出了一个核化字典学习框架,用于对具有复杂非线性性质的模型信号进行回归。将回归权值与字典和系数一起学习,进行联合优化。给出了相关的表述和词典构建步骤。为了证明所提出的技术的有效性,给出了使用不同现实数据集的详细实验结果。结果表明,非线性字典对数据建模更为准确,特别是在数据高度非线性的情况下,其估计精度比其他流行的传统方法有显著提高。
{"title":"Regressing Kernel Dictionary Learning","authors":"Kriti Kumar, A. Majumdar, M. G. Chandra, A. A. Kumar","doi":"10.1109/ICASSP.2018.8462566","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462566","url":null,"abstract":"In this paper, we present a kernelized dictionary learning framework for carrying out regression to model signals having a complex nonlinear nature. A joint optimization is carried out where the regression weights are learnt together with the dictionary and coefficients. Relevant formulation and dictionary building steps are provided. To demonstrate the effectiveness of the proposed technique, elaborate experimental results using different real-life datasets are presented. The results show that non-linear dictionary is more accurate for data modeling and provides significant improvement in estimation accuracy over the other popular traditional techniques especially when the data is highly non-linear.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"2756-2760"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88722111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Discriminative Probabilistic Framework for Generalized Multi-Instance Learning 广义多实例学习的判别概率框架
Anh T. Pham, R. Raich, Xiaoli Z. Fern, Weng-Keen Wong, Xinze Guan
Multiple-instance learning is a framework for learning from data consisting of bags of instances labeled at the bag level. A common assumption in multi-instance learning is that a bag label is positive if and only if at least one instance in the bag is positive. In practice, this assumption may be violated. For example, experts may provide a noisy label to a bag consisting of many instances, to reduce labeling time. Here, we consider generalized multi-instance learning, which assumes that the bag label is non-deterministically determined based on the number of positive instances in the bag. The challenge in this setting is to simultaneous learn an instance classifier and the unknown bag-labeling probabilistic rule. This paper addresses the generalized multi-instance learning using a discriminative probabilistic graphical model with exact and efficient inference. Experiments on both synthetic and real data illustrate the effectiveness of the proposed method relative to other methods including those that follow the traditional multiple-instance learning assumption.
多实例学习是一个框架,用于从由包级标记的实例包组成的数据中学习。多实例学习中的一个常见假设是,当且仅当袋子中至少有一个实例为正时,袋子标签是正的。在实践中,这个假设可能会被违背。例如,专家可能会为一个由许多实例组成的袋子提供一个嘈杂的标签,以减少标签时间。在这里,我们考虑广义多实例学习,它假设袋子标签是基于袋子中正实例的数量而非确定性地确定的。在这种情况下的挑战是同时学习一个实例分类器和未知的袋标签概率规则。本文采用一种具有精确和高效推理的判别概率图模型来解决广义多实例学习问题。在综合数据和真实数据上的实验表明,该方法相对于传统的多实例学习方法是有效的。
{"title":"Discriminative Probabilistic Framework for Generalized Multi-Instance Learning","authors":"Anh T. Pham, R. Raich, Xiaoli Z. Fern, Weng-Keen Wong, Xinze Guan","doi":"10.1109/ICASSP.2018.8462099","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462099","url":null,"abstract":"Multiple-instance learning is a framework for learning from data consisting of bags of instances labeled at the bag level. A common assumption in multi-instance learning is that a bag label is positive if and only if at least one instance in the bag is positive. In practice, this assumption may be violated. For example, experts may provide a noisy label to a bag consisting of many instances, to reduce labeling time. Here, we consider generalized multi-instance learning, which assumes that the bag label is non-deterministically determined based on the number of positive instances in the bag. The challenge in this setting is to simultaneous learn an instance classifier and the unknown bag-labeling probabilistic rule. This paper addresses the generalized multi-instance learning using a discriminative probabilistic graphical model with exact and efficient inference. Experiments on both synthetic and real data illustrate the effectiveness of the proposed method relative to other methods including those that follow the traditional multiple-instance learning assumption.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"2281-2285"},"PeriodicalIF":0.0,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84817979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compressive Sampling of Sound Fields Using Moving Microphones 使用移动麦克风的声场压缩采样
Fabrice Katzberg, Radoslaw Mazur, M. Maass, P. Koch, A. Mertins
For conventional sampling of sound-fields, the measurement in space by use of stationary microphones is impractical for high audio frequencies. Satisfying the Nyquist-Shannon sampling theorem requires a huge number of sampling points and entails other difficulties, such as the need for exact calibration and spatial positioning of a large number of microphones. Dynamic sound-field measurements involving tracked microphones may weaken this spatial sampling problem. However, for aliasing-free reconstruction, there is still the need of sampling a huge number of unknown sound-field variables. Thus in real-world applications, the trajectories may be expected to lead to underdetermined sampling problems. In this paper, we present a compressed sensing framework that allows for stable and robust sub-Nyquist sampling of sound fields by use of moving microphones.
对于传统的声场采样,使用固定麦克风在空间中测量高音频是不切实际的。满足Nyquist-Shannon采样定理需要大量的采样点,同时还需要对大量麦克风进行精确校准和空间定位等困难。涉及跟踪麦克风的动态声场测量可能会削弱这种空间采样问题。然而,为了实现无混叠重建,仍然需要对大量未知声场变量进行采样。因此,在实际应用中,轨迹可能会导致欠定采样问题。在本文中,我们提出了一个压缩传感框架,该框架允许使用移动麦克风对声场进行稳定和鲁棒的亚奈奎斯特采样。
{"title":"Compressive Sampling of Sound Fields Using Moving Microphones","authors":"Fabrice Katzberg, Radoslaw Mazur, M. Maass, P. Koch, A. Mertins","doi":"10.1109/ICASSP.2018.8461519","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461519","url":null,"abstract":"For conventional sampling of sound-fields, the measurement in space by use of stationary microphones is impractical for high audio frequencies. Satisfying the Nyquist-Shannon sampling theorem requires a huge number of sampling points and entails other difficulties, such as the need for exact calibration and spatial positioning of a large number of microphones. Dynamic sound-field measurements involving tracked microphones may weaken this spatial sampling problem. However, for aliasing-free reconstruction, there is still the need of sampling a huge number of unknown sound-field variables. Thus in real-world applications, the trajectories may be expected to lead to underdetermined sampling problems. In this paper, we present a compressed sensing framework that allows for stable and robust sub-Nyquist sampling of sound fields by use of moving microphones.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"85 1","pages":"181-185"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85483167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1