2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Enhancing observability in power distribution grids 提高配电网的可观测性

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-11-18 DOI: 10.1109/ICASSP.2017.7953018

Siddharth Bhela, V. Kekatos, Liang Zhang, S. Veeramachaneni

Power distribution grids are currently challenged by observability issues due to limited metering infrastructure. On the other hand, smart meter data, including local voltage magnitudes and power injections, are collected at grid nodes with renewable generation and demand-response programs. A power flow-based approach using these data is put forth here to infer the unknown power injections at non-metered grid nodes. Exploiting the control capabilities of smart inverters and the relative time-invariance of conventional loads, the idea is to solve the non-linear power flow equations jointly over two system realizations. An intuitive condition pertaining to the graph of the underlying grid is shown to be necessary and sufficient for the local identifiability of this task. The derived graph theoretic criterion can be checked efficiently and is numerically verified under realistic scenarios on the IEEE 13-bus feeder.

由于计量基础设施有限，配电网目前面临着可观测性问题的挑战。另一方面，智能电表数据，包括当地电压大小和电力注入，是在可再生能源发电和需求响应程序的电网节点收集的。本文提出了一种基于潮流的方法，利用这些数据来推断非计量电网节点的未知功率注入。利用智能逆变器的控制能力和传统负载的相对时不变性，其思想是在两种系统实现上联合求解非线性潮流方程。一个与底层网格图形相关的直观条件对于该任务的局部可识别性是必要和充分的。推导的图论判据能够有效地进行校核，并在IEEE 13总线馈线的实际场景下进行了数值验证。

引用次数: 10

A subspace approach for shrinkage parameter selection in undersampled configuration for Regularised Tyler Estimators 正则Tyler估计欠采样配置收缩参数选择的子空间方法

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-09-05 DOI: 10.1109/ICASSP.2017.7952765

Q. Hoarau, A. Breloy, G. Ginolhac, A. Atto, J. Nicolas

Regularized Tyler Estimator's (RTE) have raised attention over the past years due to their attractive performance over a wide range of noise distributions and their natural robustness to outliers. Developing adaptive methods for the selection of the regularisation parameter α is currently an active topic of research. Indeed, the bias-performance compromise of RTEs highly depends on the considered application. Thus, finding a generic rule that is optimal for every criterion and/or data configurations is not straightforward. This issue is addressed in this paper for undersampled configurations (number of samples lower than the dimension of the data). The paper proposes a new regularisation parameter selection based on a subspace reduction approach. The performance of this method is investigated in terms of estimation accuracy and for adaptive detection purposes, both on simulation and real data.

近年来，正则化泰勒估计(RTE)因其在大范围噪声分布下的良好性能和对异常值的天然鲁棒性而引起了人们的关注。开发正则化参数α的自适应选择方法是目前研究的一个活跃课题。实际上，rte的偏差-性能折衷高度依赖于所考虑的应用程序。因此，找到适合每个标准和/或数据配置的通用规则并不是一件容易的事。本文针对欠采样配置(样本数量低于数据的维度)解决了这个问题。提出了一种基于子空间约简的正则化参数选择方法。从估计精度和自适应检测两方面对该方法在仿真和实际数据上的性能进行了研究。

引用次数: 3

Artificial bandwidth extension using the constant Q transform 使用常数Q变换的人工带宽扩展

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-08-04 DOI: 10.1109/ICASSP.2017.7953218

Pramod B. Bachhav, M. Todisco, M. M. Idrissa, C. Beaugeant, N. Evans

Most artificial bandwidth extension (ABE) algorithms are based on the classical source-filter model of speech production. This approach generally requires the dual extension of each component through independent processing. Alternative approaches reported recently operate on the spectrum. With human perception thought to be largely insensitive to phase, most such approaches focus on the extension of the magnitude spectrum alone and rely on Fourier spectral analysis. This paper reports an approach to ABE based on the constant Q transform (CQT), a more perceptually motivated approach to spectral analysis. A Gaussian mixture model is used to estimate missing highband components from available narrowband components before resynthesis with phase estimates obtained from the upsampled narrowband signal. Objective assessment shows that energy normalisation is critical to performance. These findings and the appeal of CQT for ABE are confirmed through informal subjective tests based on the mean opinion score.

大多数人工带宽扩展(ABE)算法都是基于经典的语音生成源-滤波器模型。这种方法通常需要通过独立处理对每个组件进行双重扩展。最近报道的替代方法在频谱上起作用。由于人类的感知被认为在很大程度上对相位不敏感，大多数这样的方法只关注幅度谱的扩展，并依赖于傅立叶谱分析。本文报告了一种基于常数Q变换(CQT)的ABE方法，这是一种更直观的频谱分析方法。在与上采样窄带信号的相位估计重新合成之前，使用高斯混合模型从可用的窄带分量中估计缺失的高带分量。客观评估表明，能量正常化对性能至关重要。这些发现和CQT对ABE的吸引力通过基于平均意见得分的非正式主观测试得到证实。

引用次数: 14

Salience based lexical features for emotion recognition 基于显著性的词汇特征情感识别

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-08-03 DOI: 10.1109/ICASSP.2017.7953274

Kalani Wataraka Gamage, V. Sethu, E. Ambikairajah

In this paper we focus on the usefulness of verbal events for speech based emotion recognition. In particular, the use of phoneme sequences to encode verbal cues related to the expression of emotions is proposed and lexical features based on these phoneme sequences are introduced for use in automatic emotion recognition systems where manual transcripts are not available. Secondly, a novel estimate of emotional salience of verbal cues, applicable to both phoneme sequences and words, is presented. Experimental results on the IEMOCAP database show that the proposed automatic phoneme sequence based features can achieve an Unweighted Average Recall (UAR) of 49% with proposed salience measure. Further, the proposed salience measure can lead to an UAR of 64% when using manual word transcriptions. Both of these are the highest UARs reported on the IEMOCAP database for systems using lexical features extracted from automatic and manual transcripts respectively.

在本文中，我们重点研究了言语事件对基于语音的情感识别的有用性。特别是，提出了使用音素序列来编码与情感表达相关的言语线索，并引入了基于这些音素序列的词汇特征，用于无法获得手动转录的自动情感识别系统。其次，提出了一种新的语言线索情感显著性估计方法，该方法适用于音素序列和单词。在IEMOCAP数据库上的实验结果表明，在显著性度量下，基于音素序列的自动特征可达到49%的未加权平均召回率(UAR)。此外，当使用手动单词转录时，所提出的显著性度量可以导致64%的UAR。对于使用从自动和手动抄本中提取的词法特征的系统，这两个值都是IEMOCAP数据库中报告的最高uar。

引用次数: 16

Adapting and controlling DNN-based speech synthesis using input codes 使用输入码自适应控制基于dnn的语音合成

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7953089

Hieu-Thi Luong, Shinji Takaki, G. Henter, J. Yamagishi

Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of target-speaker adaptation data, and 3) modify synthetic speech characteristics based on the input codes. Using a large-scale, studio-quality speech corpus with 135 speakers of both genders and ages between tens and eighties, we performed three experiments: 1) First, we used a subset of speakers to construct a DNN-based, multi-speaker acoustic model with speaker codes. 2) Next, we performed speaker adaptation by estimating code vectors for new speakers via backpropagation from a small amount of adaptation material. 3) Finally, we experimented with manually manipulating input code vectors to alter the gender and/or age characteristics of the synthesised speech. Experimental results show that high-performance multi-speaker models can be constructed using the proposed code vectors with a variety of encoding schemes, and that adaptation and manipulation can be performed effectively using the codes.

如何适应和控制输出语音的特征是语音合成中的一个重要课题。在这项工作中，我们研究了基于dnn的文本到语音系统的性能，该系统与传统文本输入并行，也将说话人、性别和年龄代码作为输入，以便1)执行多说话人合成，2)使用少量目标说话人自适应数据执行说话人自适应，以及3)根据输入代码修改合成语音特征。我们使用了一个大规模的、工作室质量的语音语料库，其中有135名男女和年龄在10岁到80岁之间的说话者，我们进行了三个实验:1)首先，我们使用说话者子集构建了一个基于dnn的、带有说话者代码的多说话者声学模型。2)接下来，我们通过少量自适应材料通过反向传播估计新说话人的编码向量来进行说话人自适应。3)最后，我们尝试手动操纵输入代码向量来改变合成语音的性别和/或年龄特征。实验结果表明，利用所提出的编码向量和多种编码方案，可以构建高性能的多说话人模型，并且可以有效地进行自适应和操作。

{"title":"Adapting and controlling DNN-based speech synthesis using input codes","authors":"Hieu-Thi Luong, Shinji Takaki, G. Henter, J. Yamagishi","doi":"10.1109/ICASSP.2017.7953089","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953089","url":null,"abstract":"Methods for adapting and controlling the characteristics of output speech are important topics in speech synthesis. In this work, we investigated the performance of DNN-based text-to-speech systems that in parallel to conventional text input also take speaker, gender, and age codes as inputs, in order to 1) perform multi-speaker synthesis, 2) perform speaker adaptation using small amounts of target-speaker adaptation data, and 3) modify synthetic speech characteristics based on the input codes. Using a large-scale, studio-quality speech corpus with 135 speakers of both genders and ages between tens and eighties, we performed three experiments: 1) First, we used a subset of speakers to construct a DNN-based, multi-speaker acoustic model with speaker codes. 2) Next, we performed speaker adaptation by estimating code vectors for new speakers via backpropagation from a small amount of adaptation material. 3) Finally, we experimented with manually manipulating input code vectors to alter the gender and/or age characteristics of the synthesised speech. Experimental results show that high-performance multi-speaker models can be constructed using the proposed code vectors with a variety of encoding schemes, and that adaptation and manipulation can be performed effectively using the codes.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121883567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 72

Line detection in speckle images using Radon transform and ℓ1 regularization 基于Radon变换和l1正则化的散斑图像行检测

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7953356

N. Anantrasirichai, M. Allinovi, W. Hayes, D. Bull, A. Achim

Boundaries and lines in medical images are important structures as they can delineate between tissue types, organs, and membranes. Although, a number of image enhancement and segmentation methods have been proposed to detect lines, none of these have considered line artefacts, which are more difficult to visualise as they are not physical structures, yet are still meaningful for clinical interpretation. This paper presents a novel method to restore lines, including line artefacts, in speckle images. We address this as a sparse estimation problem using a convex optimisation technique based on a Radon transform and sparsity regularisation (ℓ1 norm). This problem divides into subproblems which are solved using the alternating direction method of multipliers, thereby achieving line detection and deconvolution simultaneously. The results for both simulated and in vivo ultrasound images show that the proposed method outperforms existing methods, in particular for detecting B-lines in lung ultrasound images, where the performance can be improved by up to 30 %.

医学图像中的边界和线条是重要的结构，因为它们可以划定组织类型，器官和膜之间的界限。尽管已经提出了许多图像增强和分割方法来检测线条，但这些方法都没有考虑到线条伪影，因为它们不是物理结构，因此更难以可视化，但对临床解释仍然有意义。本文提出了一种新的方法来恢复散斑图像中的线，包括线伪。我们使用基于Radon变换和稀疏正则化(1范数)的凸优化技术来解决这个稀疏估计问题。该问题分为若干子问题，这些子问题采用乘法器交替方向法求解，从而同时实现线检测和反卷积。模拟和体内超声图像的结果表明，所提出的方法优于现有方法，特别是在检测肺超声图像中的b线时，其性能可提高30%。

引用次数: 1

Recurrent neural network language models for keyword search 关键词搜索的递归神经网络语言模型

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7953263

X. Chen, A. Ragni, J. Vasilakes, X. Liu, K. Knill, M. Gales

Recurrent neural network language models (RNNLMs) have becoming increasingly popular in many applications such as automatic speech recognition (ASR). Significant performance improvements in both perplexity and word error rate over standard n-gram LMs have been widely reported on ASR tasks. In contrast, published research on using RNNLMs for keyword search systems has been relatively limited. In this paper the application of RNNLMs for the IARPA Babel keyword search task is investigated. In order to supplement the limited acoustic transcription data, large amounts of web texts are also used in large vocabulary design and LM training. Various training criteria were then explored to improved RNNLMs' efficiency in both training and evaluation. Significant and consistent improvements on both keyword search and ASR tasks were obtained across all languages.

递归神经网络语言模型(rnnlm)在自动语音识别(ASR)等领域的应用越来越广泛。在ASR任务中，与标准n-gram LMs相比，在困惑度和单词错误率方面的显着性能改进已经被广泛报道。相比之下，已发表的将rnnlm用于关键字搜索系统的研究相对有限。本文研究了rnnlm在IARPA Babel关键字搜索任务中的应用。为了补充有限的声学转录数据，在大词汇设计和LM训练中也使用了大量的网络文本。然后探讨了各种训练标准，以提高rnnlm在训练和评估方面的效率。在所有语言中，关键字搜索和ASR任务都获得了显著和一致的改进。

引用次数: 6

Least 1-norm pole-zero modeling with sparse deconvolution for speech analysis 基于稀疏反卷积的语音分析最小1范数极点零建模

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7952252

Liming Shi, J. Jensen, M. G. Christensen

In this paper, we present a speech analysis method based on sparse pole-zero modeling of speech. Instead of using the all-pole model to approximate the speech production filter, a pole-zero model is used for the combined effect of the vocal tract; radiation at the lips and the glottal pulse shape. Moreover, to consider the spiky excitation form of the pulse train during voiced speech, the modeling parameters and sparse residuals are estimated in an iterative fashion using a least 1-norm pole-zero with sparse deconvolution algorithm. Compared with the conventional two-stage least squares pole-zero, linear prediction and sparse linear prediction methods, experimental results show that the proposed speech analysis method has lower spectral distortion, higher reconstruction SNR and sparser residuals.

本文提出了一种基于稀疏极点零建模的语音分析方法。代替使用全极点模型来近似语音产生滤波器，使用极点-零模型来模拟声道的综合效应;唇部的辐射和声门的脉冲形状。此外，为了考虑浊音过程中脉冲序列的尖尖激励形式，采用最小1范数极点零稀疏反卷积算法以迭代方式估计建模参数和稀疏残差。实验结果表明，与传统的两阶段最小二乘极点零预测、线性预测和稀疏线性预测方法相比，本文提出的语音分析方法具有更低的频谱失真、更高的重建信噪比和更稀疏的残差。

引用次数: 5

Balanced sensor management across multiple time instances via l-1/l-infinity norm minimization 通过l-1/l-∞范数最小化来平衡多个时间实例的传感器管理

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7952769

Cristian Rusu, J. Thompson, N. Robertson

In this paper, we propose a solution to the sensor management problem over multiple time instances that balances the accuracy of the sensor network estimation with its utilization. We show how this problem reduces to a binary optimization problem for which we give a convex relaxation based solution that involves the minimization of a regularized ℓ∞ reweighted ℓ1 norm. We show experimentally the behavior of the proposed algorithm and compare it with previous methods from the literature.

在本文中，我们提出了一种多时间实例传感器管理问题的解决方案，以平衡传感器网络估计的准确性及其利用率。我们展示了这个问题如何简化为一个二元优化问题，我们给出了一个基于凸松弛的解决方案，该解决方案涉及正则化的重新加权的l_1范数的最小化。我们通过实验证明了所提出算法的行为，并将其与文献中先前的方法进行了比较。

引用次数: 3

PPG-based heart rate estimation using Wiener filter, phase vocoder and Viterbi decoding 基于ppg的心率估计，使用维纳滤波器，相位声码器和维特比解码

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2017-06-19 DOI: 10.1109/ICASSP.2017.7952309

A. Temko

Accurate heart rate (HR) estimation from the photoplethysmographic (PPG) signal during intensive physical exercises is tackled in this paper. Wiener filters are designed to attenuate the influence of motion artifacts. The phase vocoder is used to improve the initial Discrete Fourier transform (DFT) based frequency estimation. Additionally, Viterbi decoding is used as a novel post-processing step to find the path through time-frequency state-space plane. The system performance is assessed on a publically available dataset of 23 PPG recordings. The resulting algorithm is designed for scenarios that do not require online HR monitoring (swimming, offline fitness statistics). The resultant system with an error rate of 1.31 beats per minute outperforms all other systems reported to-date in literature and in contrast to existing alternatives requires no parameter to tune at the post-processing stage and operates at a much lower computational cost. The Matlab implementation is provided online.

本文研究了在高强度体育运动中利用光容积脉搏波(PPG)信号准确估计心率(HR)的方法。维纳滤波器被设计用来减弱运动伪影的影响。相位声码器用于改进基于初始离散傅里叶变换(DFT)的频率估计。此外，采用维特比解码作为一种新颖的后处理步骤，通过时频状态空间平面寻找路径。系统性能在23个PPG记录的公开数据集上进行评估。生成的算法是为不需要在线人力资源监控(游泳、离线健身统计)的场景而设计的。由此产生的系统的错误率为每分钟1.31次，优于迄今为止文献报道的所有其他系统，与现有的替代方案相比，在后处理阶段不需要参数调整，并且以更低的计算成本运行。在线提供了Matlab实现。

引用次数: 9

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀