2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Smooth time-frequency estimation using covariance fitting 使用协方差拟合平滑时频估计

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/icassp.2014.6853702

Johan Brynolfsson, Johan Sward, A. Jakobsson, M. Hansson

In this paper, we introduce a time-frequency spectral estimator for smooth spectra, allowing for irregularly sampled measurements. A non-parametric representation of the time dependent (TD) covariance matrix is formed by assuming that the spectrum is piecewise linear. Using this representation, the time-frequency spectrum is then estimated by solving a convex covariance fitting problem, which also, as a byproduct, provides an enhanced estimation of the TD covariance matrix. Numerical examples using simulated non-stationary processes show the preferable performance of the proposed method as compared to the classical Wigner-Ville distribution and a smoothed spectrogram.

在本文中，我们引入了一种光滑光谱的时频估计器，允许不规则采样测量。通过假设频谱是分段线性的，形成了时间相关(TD)协方差矩阵的非参数表示。使用这种表示，然后通过求解凸协方差拟合问题来估计时频谱，这也作为副产品，提供了对TD协方差矩阵的增强估计。采用模拟非平稳过程的数值算例表明，与经典的Wigner-Ville分布和平滑谱图相比，该方法具有更好的性能。

引用次数: 2

Maximum likelihood SNR estimation over time-varying flat-fading SIMO channels 时变平衰落SIMO信道的最大似然信噪比估计

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854861

F. Bellili, Rabii Meftehi, S. Affes, A. Stephenne

In this paper, we propose a new signal-to-noise-ratio (SNR) maximum likelihood (ML) estimator over time-varying single-input multiple-output (SIMO) channels, for both data-aided (DA) and non-data-aided (NDA) cases. Unlike the classical techniques which assume the channel to be slowly time-varying and, therefore, considered as constant during the observation period, we address the more challenging problem of instantaneous SNR estimation over fast time-varying channels. The channel variations are locally tracked using a polynomial-in-time expansion. In the DA scenario, the ML estimator is developed in closed-form expression. In the NDA scenario, however, the ML estimates of the per-antenna SNRs are obtained iteratively, with very few iterations, using the expectation-maximization (EM) procedure. Our estimator is able to accurately estimate the instantaneous SNRs over a wide range of average SNR. We show through extensive Monte-Carlo simulations that the new estimator outperforms previously developed solutions.

在本文中，我们提出了一种新的时变单输入多输出(SIMO)通道上的信噪比(SNR)最大似然(ML)估计器，适用于数据辅助(DA)和非数据辅助(NDA)情况。与传统技术假设信道是缓慢时变的，因此在观测期间被认为是恒定的不同，我们解决了在快速时变信道上的瞬时信噪比估计的更具挑战性的问题。通道变化是局部跟踪使用多项式在时间展开。在数据分析场景中，机器学习估计器以封闭形式表达。然而，在NDA场景中，使用期望最大化(EM)过程迭代地获得每天线信噪比的ML估计，迭代次数很少。我们的估计器能够在很宽的平均信噪比范围内准确估计瞬时信噪比。我们通过广泛的蒙特卡罗模拟表明，新的估计器优于以前开发的解决方案。

引用次数: 8

Signal processing challenges for radio astronomical arrays 射电天文阵列的信号处理挑战

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854631

S. Wijnholds, A. V. D. Veen, F. D. Stefani, E. L. Rosa, A. Farina

Current and future radio telescopes, in particular the Square Kilometre Array (SKA), are envisaged to produce large images (> 108 pixels) with over 60 dB dynamic range. This poses a number of image reconstruction and technological challenges, which will require novel approaches to image reconstruction and design of data processing systems. In this paper, we sketch the limitations of current algorithms by extrapolating their computational requirements to future radio telescopes as well as by discussing their imaging limitations. We discuss a number of potential research directions to cope with these challenges.

当前和未来的射电望远镜，特别是平方公里阵列(SKA)，预计将产生动态范围超过60 dB的大图像(bbb108像素)。这带来了许多图像重建和技术挑战，这将需要新的图像重建方法和数据处理系统的设计。在本文中，我们通过外推其对未来射电望远镜的计算需求以及讨论其成像限制来概述当前算法的局限性。我们讨论了一些潜在的研究方向来应对这些挑战。

引用次数: 11

Multiple-average-voice-based speech synthesis 基于多重平均语音的语音合成

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853603

P. Lanchantin, M. Gales, Simon King, J. Yamagishi

This paper describes a novel approach for the speaker adaptation of statistical parametric speech synthesis systems based on the interpolation of a set of average voice models (AVM). Recent results have shown that the quality/naturalness of adapted voices depends on the distance from the average voice model used for speaker adaptation. This suggests the use of several AVMs trained on carefully chosen speaker clusters from which a more suitable AVM can be selected/interpolated during the adaptation. In the proposed approach a set of AVMs, a multiple-AVM, is trained on distinct clusters of speakers which are iteratively re-assigned during the estimation process initialised according to metadata. During adaptation, each AVM from the multiple-AVM is first adapted towards the target speaker. The adapted means from the AVMs are then interpolated to yield the final speaker adapted mean for synthesis. It is shown, performing speaker adaptation on a corpus of British speakers with various regional accents, that the quality/naturalness of synthetic speech of adapted voices is significantly higher than when considering a single factor-independent AVM selected according to the target speaker characteristics.

本文提出了一种基于平均语音模型(AVM)插值的统计参数语音合成系统的说话人自适应方法。最近的研究结果表明，适应语音的质量/自然程度取决于与用于说话者适应的平均语音模型的距离。这表明在精心选择的说话人群上训练了几个AVM，在适应过程中可以从中选择/插入更合适的AVM。在本文提出的方法中，一组avm(多avm)在不同的说话人簇上进行训练，这些说话人簇在根据元数据初始化的估计过程中迭代地重新分配。在适应过程中，多个AVM中的每个AVM首先适应目标说话人。从avm适应的手段，然后内插，以产生合成的最终扬声器适应的平均值。通过对不同地域口音的英国人语料库进行说话人适配，结果表明，适配后的声音合成语音的质量/自然度明显高于根据目标说话人特征选择的单因素独立AVM。

{"title":"Multiple-average-voice-based speech synthesis","authors":"P. Lanchantin, M. Gales, Simon King, J. Yamagishi","doi":"10.1109/ICASSP.2014.6853603","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853603","url":null,"abstract":"This paper describes a novel approach for the speaker adaptation of statistical parametric speech synthesis systems based on the interpolation of a set of average voice models (AVM). Recent results have shown that the quality/naturalness of adapted voices depends on the distance from the average voice model used for speaker adaptation. This suggests the use of several AVMs trained on carefully chosen speaker clusters from which a more suitable AVM can be selected/interpolated during the adaptation. In the proposed approach a set of AVMs, a multiple-AVM, is trained on distinct clusters of speakers which are iteratively re-assigned during the estimation process initialised according to metadata. During adaptation, each AVM from the multiple-AVM is first adapted towards the target speaker. The adapted means from the AVMs are then interpolated to yield the final speaker adapted mean for synthesis. It is shown, performing speaker adaptation on a corpus of British speakers with various regional accents, that the quality/naturalness of synthetic speech of adapted voices is significantly higher than when considering a single factor-independent AVM selected according to the target speaker characteristics.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"285-289"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78561822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Waveform selection for range and Doppler estimation via Barankin bound signal-to-noise ratio threshold 利用巴兰金界信噪比阈值进行距离和多普勒估计的波形选择

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854485

John S. Kota, N. Kovvali, D. Bliss, A. Papandreou-Suppappola

In this paper, we consider the tracking of a radar target with unknown range and range rate at low signal-to-noise ratio (SNR). For this nonlinear estimation problem, the Cramér-Rao lower bound (CRLB) provides a bound on an unbiased estimator's mean-squared error (MSE). However, there exists a threshold SNR at which the estimator variance deviates from the CRLB. We consider the Barankin bound (BB) on the range and range-rate variance in order to obtain a tighter lower bound at low SNR, and we use the BB to predict the SNR threshold for a transmitted signal. We demonstrate that the BB with the additional information provided by the threshold SNR has an advantage over the CRLB in selecting the optimal transmit waveform at low SNRs. We also develop a waveform parameter configuration method that uses the BB and the ambiguity function resolution cell measurement model to optimize the SNR threshold.

本文研究了在低信噪比条件下未知距离和距离速率的雷达目标跟踪问题。对于这种非线性估计问题，cram r- rao下界(CRLB)提供了无偏估计量均方误差(MSE)的一个界。然而，存在一个阈值信噪比，估计量方差偏离CRLB。为了在低信噪比下获得更严格的下界，我们考虑了距离和距离率方差的巴兰金界(BB)，并使用BB来预测传输信号的信噪比阈值。我们证明了具有阈值信噪比提供的附加信息的BB在低信噪比下选择最佳发射波形方面比CRLB具有优势。我们还开发了一种波形参数配置方法，该方法使用BB和模糊函数分辨率单元测量模型来优化信噪比阈值。

引用次数: 5

Functional relevant multichannel kernel adaptive filter for human activity analysis 功能相关多通道核自适应滤波器用于人体活动分析

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854427

A. Álvarez-Meza, G. Castellanos-Domínguez, J. Príncipe

A multichannel kernel adaptive filtering framework is presented that highlights relevant channels for the task of analyzing Motion Capture (MoCap) data. Functional relevance analysis is performed over input multichannel data by computing the pair-wise channel similarities to describe the main behavior of the considered applications. Particularly, the well-known Kernel Least Mean Square filter is enhanced using a correntropy-based similarity criterion between channel pairs. Besides, two sparseness criteria are studied to extract a sample subset that constructs a learning model displaying a good trade-off between filter complexity and accuracy. The proposed approach allows devising complex relationship among multi-channel time-series, revealing dependencies among the channels and the process time-structure. The method is tested in a well-known MoCap data set. Results show that our framework is an adequate alternative for finding functional relevance amongst multi-channel time-series.

提出了一种多通道核自适应滤波框架，该框架突出了运动捕捉(MoCap)数据分析任务的相关通道。通过计算成对通道相似性来描述所考虑的应用程序的主要行为，对输入多通道数据执行功能相关性分析。特别是，众所周知的核最小均方滤波器使用基于相关熵的信道对之间的相似性准则进行了增强。此外，研究了两个稀疏性准则，以提取样本子集，构建一个学习模型，在过滤器复杂性和准确性之间取得良好的平衡。该方法允许设计多通道时间序列之间的复杂关系，揭示通道之间的依赖关系和过程时间结构。该方法在一个著名的动作捕捉数据集中进行了测试。结果表明，我们的框架是寻找多通道时间序列之间功能相关性的适当替代方案。

引用次数: 0

High accuracy discrimination of Parkinson's disease participants from healthy controls using smartphones 使用智能手机对帕金森病患者与健康对照组进行高精度区分

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854280

S. Arora, V. Venkataraman, S. Donohue, K. Biglan, E. Dorsey, Max A. Little

The aim of this study is to accurately distinguish Parkinson's disease (PD) participants from healthy controls using self-administered tests of gait and postural sway. Using consumer-grade smartphones with in-built accelerometers, we objectively measure and quantify key movement severity symptoms of Parkinson's disease. Specifically, we record tri-axial accelerations, and extract a range of different features based on the time and frequency-domain properties of the acceleration time series. The features quantify key characteristics of the acceleration time series, and enhance the underlying differences in the gait and postural sway accelerations between PD participants and controls. Using a random forest classifier, we demonstrate an average sensitivity of 98.5% and average specificity of 97.5% in discriminating PD participants from controls.

本研究的目的是通过自我管理的步态和姿势摇摆测试，准确区分帕金森病(PD)参与者和健康对照。使用内置加速度计的消费级智能手机，我们客观地测量和量化帕金森病的关键运动严重程度症状。具体来说，我们记录三轴加速度，并根据加速度时间序列的时间域和频域特性提取一系列不同的特征。这些特征量化了加速时间序列的关键特征，并增强了PD参与者和对照组之间步态和姿势摇摆加速度的潜在差异。使用随机森林分类器，我们证明了区分PD参与者和对照组的平均灵敏度为98.5%，平均特异性为97.5%。

引用次数: 65

Detection of sign-language content in video through polar motion profiles 通过极运动配置文件检测视频中的手语内容

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6853805

Virendra Karappa, C. D. D. Monteiro, F. Shipman, R. Gutierrez-Osuna

Locating sign language (SL) videos on video sharing sites (e.g., YouTube) is challenging because search engines generally do not use the visual content of videos for indexing. Instead, indexing is done solely based on textual content (e.g., title, description, metadata). As a result, untagged SL videos do not appear in the search results. In this paper, we present and evaluate a classification approach to detect SL videos based on their visual content. The approach uses an ensemble of Haar-based face detectors to define regions of interest (ROI), and a background model to segment movements in the ROI. The two-dimensional (2D) distribution of foreground pixels in the ROI is then reduced to two 1D polar motion profiles by means of a polar-coordinate transformation, and then classified by means of an SVM. When evaluated on a dataset of user-contributed YouTube videos, the approach achieves 81% precision and 94% recall.

在视频分享网站(例如YouTube)上定位手语(SL)视频具有挑战性，因为搜索引擎通常不使用视频的视觉内容进行索引。相反，索引是完全基于文本内容(例如，标题、描述、元数据)完成的。因此，未标记的SL视频不会出现在搜索结果中。在本文中，我们提出并评估了一种基于视觉内容检测SL视频的分类方法。该方法使用基于haar的人脸检测器集合来定义感兴趣区域(ROI)，并使用背景模型来分割感兴趣区域中的运动。然后通过极坐标变换将前景像素在感兴趣区域内的二维(2D)分布简化为两个一维极运动轮廓，然后通过支持向量机进行分类。当对用户贡献的YouTube视频数据集进行评估时，该方法达到了81%的准确率和94%的召回率。

引用次数: 8

Non-uniform sampling and Gaussian process regression in transport of intensity phase imaging 强度相位成像传输中的非均匀采样和高斯过程回归

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6855115

Jingshan Zhong, Rene A. Claus, J. Dauwels, L. Tian, L. Waller

Gaussian process (GP) regression is a nonparametric regression method that can be used to predict continuous quantities. Here, we show that the same technique can be applied to a class of phase imaging techniques based on measurements of intensity at multiple propagation distances, i.e. the transport of intensity equation (TIE). In this paper, we demonstrate how to apply GP regression to estimate the first intensity derivative along the direction of propagation and incorporate non-uniform propagation distance sampling. The low-frequency artifacts that often occur in phase recovery using traditional methods can be significantly suppressed by the proposed GP TIE method. The method is shown to be stable with moderate amounts of Gaussian noise. We validate the method experimentally by recovering the phase of human cheek cells in a bright field microscope and show better performance as compared to other TIE reconstruction methods.

高斯过程回归是一种用于预测连续量的非参数回归方法。在这里，我们展示了同样的技术可以应用于一类基于在多个传播距离上测量强度的相位成像技术，即强度传输方程(TIE)。在本文中，我们演示了如何应用GP回归沿传播方向估计第一强度导数，并结合非均匀传播距离采样。采用传统的相位恢复方法，可以有效地抑制低频伪影。结果表明，该方法在适度的高斯噪声下是稳定的。我们通过在明光场显微镜下恢复人类脸颊细胞的相位实验验证了该方法，并显示出与其他TIE重建方法相比更好的性能。

引用次数: 2

Stability and MSE analyses of affine projection algorithms for sparse system identification 稀疏系统识别仿射投影算法的稳定性和MSE分析

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2014-05-04 DOI: 10.1109/ICASSP.2014.6854836

Markus V. S. Lima, I. Sobrón, W. Martins, P. Diniz

We analyze two algorithms, viz. the affine projection algorithm for sparse system identification (APA-SSI) and the quasi APA-SSI (QAPA-SSI), regarding their stability and steady-state mean-squared error (MSE). These algorithms exploit the sparsity of the involved signals through an approximation of the l0 norm. Such approach yields faster convergence and reduced steady-state MSE, as compared to algorithms that do not take the sparse nature of the signals into account. In addition, modeling sparsity via such approximation has been consistently verified to be superior to the widely used l1 norm in several scenarios. In this paper, we show how to properly set the parameters of the two aforementioned algorithms in order to guarantee convergence, and we derive closed-form theoretical expressions for their steady-state MSE. A key conclusion from the proposed analysis is that the MSE of these two algorithms is a monotonically decreasing function of the sparsity degree. Simulation results are used to validate the theoretical findings.

本文分析了稀疏系统识别仿射投影算法(APA-SSI)和拟APA-SSI算法(QAPA-SSI)的稳定性和稳态均方误差(MSE)。这些算法通过对10范数的近似来利用所涉及信号的稀疏性。与不考虑信号稀疏特性的算法相比，这种方法产生更快的收敛速度和更低的稳态MSE。此外，在一些场景中，通过这种近似建模的稀疏性已被一致地证明优于广泛使用的l1规范。本文给出了如何合理设置上述两种算法的参数以保证其收敛性，并推导了它们的稳态均方误差的封闭形式的理论表达式。从所提出的分析中得出的一个关键结论是，这两种算法的MSE是稀疏度的单调递减函数。仿真结果验证了理论结果。

引用次数: 22

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀