2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Using optimal transport for estimating inharmonic pitch signals 用最优输运估计非谐波基音信号

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7952172

Filip Elvander, Stefan Ingi Adalbjornsson, J. Karlsson, A. Jakobsson

In this work, we propose a novel multi-pitch estimation technique that is robust with respect to the inharmonicity commonly occurring in many applications. The method does not require any a priori knowledge of the number of signal sources, the number of harmonics of each source, nor the structure or scope of any possibly occurring inharmonicity. Formulated as a minimum transport distance problem, the proposed method finds an estimate of the present pitches by mapping any found spectral line to the closest harmonic structure. The resulting optimization is a convex and highly tractable linear programming problem. The preferable performance of the proposed method is illustrated using both simulated and real audio signals.

在这项工作中，我们提出了一种新的多音高估计技术，该技术对许多应用中常见的非谐波具有鲁棒性。该方法不需要对信号源的数量、每个信号源的谐波数以及任何可能发生的不谐波的结构或范围有任何先验知识。作为一个最小传输距离问题，提出的方法通过将任何发现的谱线映射到最接近的谐波结构来找到当前音高的估计。所得到的优化是一个凸的、高度可处理的线性规划问题。用模拟和真实音频信号说明了该方法的良好性能。

引用次数: 10

An autoregressive recurrent mixture density network for parametric speech synthesis 用于参数语音合成的自回归递归混合密度网络

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7953087

Xin Wang, Shinji Takaki, J. Yamagishi

Neural-network-based generative models, such as mixture density networks, are potential solutions for speech synthesis. In this paper we follow this path and propose a recurrent mixture density network that incorporates a trainable autoregressive model. An advantage of incorporating an autoregressive model is that the time dependency within acoustic feature trajectories can be modeled without using the conventional dynamic features. More interestingly, experiments show that this autoregressive model learns to be a filter that emphasizes the high frequency components of the target acoustic feature trajectories in the training stage. In the synthesis stage, it boosts the low frequency components of the generated feature trajectories and hence increases their global variance. Experimental results show that the proposed model achieved higher likelihood on the training data and generated speech with better quality than other models when dynamic features were not utilized in any model.

基于神经网络的生成模型，如混合密度网络，是语音合成的潜在解决方案。在本文中，我们沿着这条路径，提出了一个包含可训练自回归模型的循环混合密度网络。结合自回归模型的一个优点是声学特征轨迹中的时间依赖性可以在不使用传统动态特征的情况下建模。更有趣的是，实验表明，该自回归模型在训练阶段学习成为一个滤波器，强调目标声学特征轨迹的高频成分。在合成阶段，它增强了生成的特征轨迹的低频分量，从而增加了它们的全局方差。实验结果表明，在不使用动态特征的情况下，该模型对训练数据具有较高的似然性，生成的语音质量优于其他模型。

引用次数: 51

Multicore distributed dictionary learning: A microarray gene expression biclustering case study 多核分布式字典学习:微阵列基因表达双聚类案例研究

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7952340

Stephen Laide, J. McAllister

The increasing pervasion and scale of machine learning technologies is posing fundamental challenges for their realisation. In the main, current algorithms are centralised, with a large number of processing agents, distributed across parallel processing resources, accessing a single, very large data object. This creates bottlenecks as a result of limited memory access rates. Distributed learning has the potential to resolve this problem by employing networks of co-operating agents each operating on subsets of the data, but as yet their suitability for realisation on parallel architectures such as multicore are unknown. This paper presents the results of a case study deploying distributed dictionary learning for microarray gene expression bi-clustering on a 16-core Epiphany multicore. It shows that distributed learning approaches can enable near-linear speed-up with the number of processing resources and, via the use of DMA-based communication, a 50% increase in throughput can be enabled.

机器学习技术的日益普及和规模正在为它们的实现带来根本性的挑战。总的来说，当前的算法是集中式的，具有大量的处理代理，分布在并行处理资源上，访问单个非常大的数据对象。由于内存访问速率有限，这会造成瓶颈。分布式学习有可能通过使用协作代理网络来解决这个问题，每个代理都在数据的子集上操作，但到目前为止，它们在并行架构(如多核)上实现的适用性尚不清楚。本文介绍了在16核Epiphany多核上部署用于微阵列基因表达双聚类的分布式字典学习的案例研究结果。它表明，分布式学习方法可以实现处理资源数量的近线性加速，并且通过使用基于dma的通信，可以使吞吐量增加50%。

引用次数: 1

Compressive sensing strategy for classification of bearing faults 轴承故障分类的压缩感知策略

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952543

H. Ahmed, M. Wong, A. Nandi

Owing to the importance of rolling element bearings in rotating machines, condition monitoring of rolling element bearings has been studied extensively over the past decades. However, most of the existing techniques require large storage and time for signal processing. This paper presents a new strategy based on compressive sensing for bearing faults classification that uses fewer measurements. Under this strategy, to match the compressed sensing mechanism, the compressed vibration signals are first obtained by resampling the acquired bearing vibration signals in the time domain with a random Gaussian matrix using different compressed sensing sampling rates. Then three approaches have been chosen to process these compressed data for the purpose of bearing fault classification these includes using the data directly as the input of classifier, and extract features from the data using linear feature extraction methods, namely, unsupervised Principal Component Analysis (PCA) and supervised Linear Discriminant Analysis (LDA). Classification performance using Logistic Regression Classifier (LRC) achieved high classification accuracy with significantly reduced bandwidth consumption compared with the existing techniques.

由于滚动轴承在旋转机械中的重要性，在过去的几十年中，滚动轴承的状态监测得到了广泛的研究。然而，大多数现有的技术需要大的存储空间和时间来处理信号。本文提出了一种基于压缩感知的轴承故障分类方法，该方法使用较少的测量量。在该策略下，为了匹配压缩感知机制，首先对采集到的轴承振动信号在时域用随机高斯矩阵采用不同的压缩感知采样率进行重采样，得到压缩振动信号。在此基础上，选择了三种方法对压缩数据进行处理，即直接将压缩数据作为分类器的输入，并采用线性特征提取方法，即无监督主成分分析(PCA)和监督线性判别分析(LDA)从压缩数据中提取特征。与现有的分类技术相比，使用逻辑回归分类器(LRC)实现了较高的分类精度，同时显著降低了带宽消耗。

{"title":"Compressive sensing strategy for classification of bearing faults","authors":"H. Ahmed, M. Wong, A. Nandi","doi":"10.1109/ICASSP.2017.7952543","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952543","url":null,"abstract":"Owing to the importance of rolling element bearings in rotating machines, condition monitoring of rolling element bearings has been studied extensively over the past decades. However, most of the existing techniques require large storage and time for signal processing. This paper presents a new strategy based on compressive sensing for bearing faults classification that uses fewer measurements. Under this strategy, to match the compressed sensing mechanism, the compressed vibration signals are first obtained by resampling the acquired bearing vibration signals in the time domain with a random Gaussian matrix using different compressed sensing sampling rates. Then three approaches have been chosen to process these compressed data for the purpose of bearing fault classification these includes using the data directly as the input of classifier, and extract features from the data using linear feature extraction methods, namely, unsupervised Principal Component Analysis (PCA) and supervised Linear Discriminant Analysis (LDA). Classification performance using Logistic Regression Classifier (LRC) achieved high classification accuracy with significantly reduced bandwidth consumption compared with the existing techniques.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Projection-based dual averaging for stochastic sparse optimization 基于投影的随机稀疏优化对偶平均

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952568

Asahi Ushio, M. Yukawa

We present a variant of the regularized dual averaging (RDA) algorithm for stochastic sparse optimization. Our approach differs from the previous studies of RDA in two respects. First, a sparsity-promoting metric is employed, originated from the proportionate-type adaptive filtering algorithms. Second, the squared-distance function to a closed convex set is employed as a part of the objective functions. In the particular application of online regression, the squared-distance function is reduced to a normalized version of the typical squared-error (least square) function. The two differences yield a better sparsity-seeking capability, leading to improved convergence properties. Numerical examples show the advantages of the proposed algorithm over the existing methods including ADAGRAD and adaptive proximal forward-backward splitting (APFBS).

我们提出了随机稀疏优化的正则化对偶平均(RDA)算法的一个变体。我们的方法不同于以前的RDA研究在两个方面。首先，采用源自比例型自适应滤波算法的稀疏度提升度量。其次，将闭合凸集的距离平方函数作为目标函数的一部分。在在线回归的特殊应用中，平方距离函数被简化为典型的平方误差(最小二乘)函数的标准化版本。这两种差异产生了更好的稀疏性搜索能力，从而提高了收敛性能。数值算例表明，该算法相对于现有的ADAGRAD和自适应近端前后分裂(APFBS)算法具有明显的优越性。

引用次数: 3

Coherence-adjusted monopole dictionary and convex clustering for 3D localization of mixed near-field and far-field sources 相干调整单极子字典和凸聚类用于近场和远场混合光源的三维定位

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952745

T. Tachikawa, K. Yatabe, Yasuhiro Oikawa

In this paper, 3D sound source localization method for simultaneously estimating both direction-of-arrival (DOA) and distance from the microphone array is proposed. For estimating distance, the off-grid problem must be overcome because the range of distance to be considered is quite broad and even not bounded. The proposed method estimates positions based on an extension of the convex clustering method combined with sparse coefficients estimation. A method for constructing a suitable monopole dictionary based on coherence is also proposed so that the convex clustering based method appropriately estimate distance of sound sources. Numerical experiments of distance estimation and 3D localization show possibility of the proposed method.

提出了一种同时估计声源到达方向(DOA)和距离麦克风阵列的三维声源定位方法。对于距离估计，必须克服离网问题，因为要考虑的距离范围很宽，甚至没有界限。该方法基于扩展的凸聚类方法和稀疏系数估计相结合的方法进行位置估计。提出了一种基于相干性构造合适的单极子字典的方法，使基于凸聚类的方法能较好地估计声源距离。距离估计和三维定位的数值实验表明了该方法的可行性。

引用次数: 8

Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch 正常-喊叫语音频谱映射在声音努力不匹配下的说话人识别

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7953096

Ana Ramírez López, R. Saeidi, Lauri Juvela, P. Alku

Speaker recognition performance degrades substantially in case of vocal effort mismatch (e.g. shouted vs. normal speech) between test and enrollment utterances. Such a mismatch is often encountered, for example, in forensic speaker recognition. This paper introduces a novel spectral mapping method which, when employed jointly with a statistical mapping technique, converts the Mel-frequency band energies of normal speech towards their counterparts in shouted speech. The aim is to obtain more robust performance in speaker recognition by tackling vocal effort mismatch between enrollment and test utterances. The processing is performed on the speech signal before feature extraction. The proposed approach was evaluated by testing the performance of a state-of-the-art i-vector-based speaker recognition system with and without applying the spectral mapping processing to the enrollment data. The results show that pre-processing with the proposed approach results in considerable improvement in correct identification rates.

在测试和注册之间的声音努力不匹配的情况下(例如喊叫与正常语音)，说话者识别性能会大幅下降。这种不匹配是经常遇到的，例如，在法医说话人识别。本文介绍了一种新的频谱映射方法，该方法将正常语音的mel频带能量与统计映射技术相结合，将正常语音的mel频带能量转换为大声语音中的mel频带能量。目的是通过解决注册话语和测试话语之间的声音努力不匹配问题来获得更强大的说话人识别性能。在特征提取之前对语音信号进行处理。通过测试最先进的基于i向量的说话人识别系统在对登记数据应用光谱映射处理和不应用光谱映射处理的情况下的性能来评估所提出的方法。结果表明，采用该方法进行预处理后，正确识别率有明显提高。

引用次数: 6

Time-multiplexed / superimposed pilot selection for massive MIMO pilot decontamination 大规模MIMO导频净化时复用/叠加导频选择

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952799

Karthik Upadhya, S. Vorobyov, Mikko Vehkaperä

In massive multiple-input multiple-output (MIMO) systems, superimposed (SP) and time-multiplexed (TM) pilots exhibit a complementary behavior, with the former and latter schemes offering a higher throughput in high and low inter-cell interference scenarios, respectively. Based on this observation, in this paper, we propose an algorithm for partitioning users into two disjoint sets comprising users that transmit TM and SP pilots. This selection of user sets is accomplished by minimizing the total inter-cell and intra-cell interference, and since this problem is found to be non-convex, a greedy approach is proposed to perform the partitioning. Based on simulations, it is shown that the proposed method is versatile and offers an improved performance in both high and low-interference scenarios.

在大规模多输入多输出(MIMO)系统中，叠加(SP)导频和时复用(TM)导频表现出互补的行为，前者和后者分别在高和低小区间干扰情况下提供更高的吞吐量。基于这一观察结果，本文提出了一种将用户划分为两个不相交的集的算法，其中包括传输TM和SP导频的用户。这种用户集的选择是通过最小化单元间和单元内的总干扰来完成的，并且由于发现该问题是非凸的，因此提出了一种贪婪方法来执行分区。仿真结果表明，该方法具有通用性，在高干扰和低干扰情况下都有较好的性能。

引用次数: 2

Novel medical video compression methods over lossless HEVC coder 基于无损HEVC编码器的新型医学视频压缩方法

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952444

Yao-Jen Chang, Pei-Hsuan Tsai, Chun-Lung Lin

To realize the medical video applications, this paper proposes several lossless compression methods over high efficiency video coding (HEVC). A generalized intra block copy (GIBC) is first proposed to predict the coding unit by a reference block, whose samples could be fully or partially reconstructed. A cyclic block padding technique is also proposed to predict the unreconstructed samples in the reference block by geometrically co-located blocks. Based on the feature distribution analyses for palette coding, we further propose an HEVC-based medical video coder (HMC), which combines the GIBC, line-coded palette coding and intra palette predictor without mutual conflicts. Experimental results show that, compared to the lossless HEVC, the proposed GIBC and HMC respectively save up to 13.9% and 22.3% bits over medical videos.

为了实现医疗视频的应用，本文提出了几种基于高效视频编码(HEVC)的无损压缩方法。首先提出了一种广义块内复制(GIBC)方法，通过一个参考块来预测编码单元，该参考块的样本可以完全或部分重构。提出了一种循环块填充技术，通过几何共定位块来预测参考块中的未重构样本。在分析调色板编码特征分布的基础上，我们进一步提出了一种基于hevc的医学视频编码器(HMC)，该编码器将GIBC、行编码调色板编码和不相互冲突的调色板内预测器相结合。实验结果表明，与无损HEVC相比，本文提出的GIBC和HMC分别在医疗视频上节省了13.9%和22.3%的比特。

引用次数: 2

Speech emotion recognition with ensemble learning methods 基于集成学习方法的语音情感识别

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-16 DOI: 10.1109/ICASSP.2017.7952658

Po-Yuan Shih, Chia-Ping Chen, Chung-Hsien Wu

In this paper, we propose to apply ensemble learning methods on neural networks to improve the performance of speech emotion recognition tasks. The basic idea is to first divide unbalanced data set into balanced subsets and then combine the predictions of the models trained on these subsets. Several methods regarding the decomposition of data and the exploitation of model predictions are investigated in this study. On the public-domain FAU-Aibo database, which is used in Interspeech Emotion Challenge evaluation, the best performance we achieve is an unweighted average (UA) recall rate of 45.5% for the 5-class classification task. Furthermore, such performance is achieved with a feature space of 40-dimension. Compared to the baseline system with 384-dimension feature vector per example and an UA of 38.9%, such a performance is very impressive. Indeed, this is one of the best performances on FAU-Aibo within the static modeling framework.

在本文中，我们提出将集成学习方法应用于神经网络，以提高语音情感识别任务的性能。其基本思想是首先将不平衡数据集划分为平衡子集，然后将在这些子集上训练的模型的预测组合起来。本文探讨了数据分解和模型预测利用的几种方法。在公共领域的FAU-Aibo数据库上，我们对5类分类任务的未加权平均(UA)召回率达到45.5%，该数据库用于Interspeech Emotion Challenge评估。此外，这种性能是在40维的特征空间中实现的。与每个示例具有384维特征向量和38.9%的UA的基线系统相比，这样的性能非常令人印象深刻。事实上，这是静态建模框架中FAU-Aibo上最好的性能之一。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀