2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Binaural speech segregation based on pitch and azimuth tracking 基于音高和方位跟踪的双耳语音分离

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6287862

John F. Woodruff, Deliang Wang

We propose an approach to binaural speech segregation in reverberation based on pitch and azimuth cues. These cues are integrated within a statistical tracking framework to estimate up to two concurrent pitch frequencies and three concurrent azimuth angles. The tracking framework implicitly estimates binary time-frequency masks by solving a data association problem, thereby performing speech segregation. Experimental results show that the proposed approach compares favorably to existing two-microphone systems in spite of less prior information. The benefit of the proposed approach is most pronounced in conditions with substantial reverberation or for closely spaced sources.

提出了一种基于音高和方位线索的混响双耳语音分离方法。这些线索被整合在一个统计跟踪框架中，以估计多达两个并发的俯仰频率和三个并发的方位角。跟踪框架通过解决数据关联问题隐式估计二进制时频掩码，从而执行语音隔离。实验结果表明，该方法在先验信息较少的情况下优于现有的双传声器系统。所提出的方法的好处是最明显的条件下有大量混响或紧密间隔的源。

引用次数: 4

Improved minimum converted trajectory error training for real-time speech-to-lips conversion 改进了用于实时语音到嘴唇转换的最小转换轨迹误差训练

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288921

Wei Han, Lijuan Wang, F. Soong, Bo Yuan

Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.

基于高斯混合模型(GMM)的语音到嘴唇转换通常有两种可选的方式:批量转换和基于滑动窗口的实时处理转换。为了提高批量转换的性能，以前提出了最小转换轨迹误差(MCTE)训练。在本文中，我们扩展了之前的工作，提出了一个新的训练准则，MCTE实时转换(R-MCTE)，以显式优化基于滑动窗口的转换质量。在R-MCTE中，我们使用概率下降方法通过最小化实时转换视觉轨迹对训练数据的误差来优化模型参数。对LIPS 2008视觉语音合成挑战赛数据集的客观评价表明，该方法既具有良好的嘴唇动画性能，又具有较低的实时转换延迟。

引用次数: 7

A novel eye region based privacy protection scheme 一种新的基于眼域的隐私保护方案

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288261

Dohyoung Lee, K. Plataniotis

This paper introduces a novel eye region scrambling scheme capable of protecting privacy sensitive eye region information present in video contents. The proposed system consists of an automatic eye detection module followed by a privacy enabling JPEG XR encoder module. An object detection method based on a probabilistic model of image generation is used in conjunction with a skin-tone segmentation to accurately locate eye regions in real time. The utilized JPEG XR encoder effectively deteriorate the visual quality of privacy sensitive eye region at low computational cost. Performance of proposed solution is validated using benchmark face recognition algorithms on face image database. Experimental results indicate that the proposed solution is able to conceal identity by preventing successful identification at low computational costs.

提出了一种新的眼区置乱方案，能够保护视频内容中存在的敏感眼区信息。该系统由自动眼睛检测模块和JPEG XR编码器模块组成。将基于概率图像生成模型的目标检测方法与肤色分割相结合，实时准确定位眼部区域。所采用的JPEG XR编码器以较低的计算成本有效地降低了隐私敏感眼区的视觉质量。在人脸图像数据库上使用基准人脸识别算法验证了该方法的性能。实验结果表明，该方案能够以较低的计算成本通过阻止成功识别来隐藏身份。

引用次数: 5

Analysis of the sphericalwave truncation error for spherical harmonic soundfield expansions 球面谐波声场扩展的球波截断误差分析

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6287803

S. Brown, Shuai Wang, D. Sen

Three dimensional soundfield recording and reproduction is an area of ongoing investigation and its implementation is increasingly achieved through use of the infinite Spherical Harmonic soundfield expansion. Perfect recording or reconstruction requires infinite microphones or loudspeakers, respectively. Thus, real-world approximations to both require spatial discretisation, which truncates the soundfield expansion and loses some of the soundfield information. The resulting truncation error is the focus of this paper, specifically for soundfields comprising of spherical waves. We define two norms of the truncation error to signal ratio, L2 and L∞, for comparison and use in different situations. Finally we observe how some of these errors converge to the plane wave case under certain circumstances.

三维声场记录和再现是一个正在进行研究的领域，它的实现越来越多地通过使用无限球面谐波声场扩展来实现。完美的录音或重建分别需要无限的麦克风或扬声器。因此，现实世界的逼近都需要空间离散化，这会截断声场扩展并丢失一些声场信息。由此产生的截断误差是本文的重点，特别是对于由球形波组成的声场。我们定义了截断误差与信号之比的两个规范L2和L∞，以便在不同情况下进行比较和使用。最后，我们观察了在某些情况下平面波情况下这些误差是如何收敛的。

引用次数: 3

Inventory-style speech enhancement with uncertainty-of-observation techniques 基于观察不确定性技术的清单式语音增强

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288954

R. M. Nickel, Ramón Fernández Astudillo, D. Kolossa, Steffen Zeiler, Rainer Martin

We present a new method for inventory-style speech enhancement that significantly improves over earlier approaches [1]. Inventory-style enhancement attempts to resynthesize a clean speech signal from a noisy signal via corpus-based speech synthesis. The advantage of such an approach is that one is not bound to trade noise suppression against signal distortion in the same way that most traditional methods do. A significant improvement in perceptual quality is typically the result. Disadvantages of this new approach, however, include speaker dependency, increased processing delays, and the necessity of substantial system training. Earlier published methods relied on a-priori knowledge of the expected noise type during the training process [1]. In this paper we present a new method that exploits uncertainty-of-observation techniques to circumvent the need for noise specific training. Experimental results show that the new method is not only able to match, but outperform the earlier approaches in perceptual quality.

我们提出了一种新的清单式语音增强方法，该方法比以前的方法有了显著的改进。清单式增强试图通过基于语料库的语音合成从噪声信号中重新合成干净的语音信号。这种方法的优点是，人们不必像大多数传统方法那样，用噪声抑制来对抗信号失真。典型的结果是感知质量的显著提高。然而，这种新方法的缺点包括说话者依赖性，增加处理延迟，以及需要大量的系统训练。先前发表的方法依赖于训练过程中预期噪声类型的先验知识[1]。在本文中，我们提出了一种利用观测不确定性技术来规避噪声特定训练的新方法。实验结果表明，该方法在感知质量上不仅能与已有的方法相媲美，而且优于已有的方法。

引用次数: 7

Face recognition based on nonsubsampled contourlet transform and block-based kernel Fisher linear discriminant 基于非下采样contourlet变换和分块核Fisher线性判别的人脸识别

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288183

Biao Wang, Weifeng Li, Q. Liao

Face representation, including both feature extraction and feature selection, is the key issue for a successful face recognition system. In this paper, we propose a novel face representation scheme based on nonsubsampled contourlet transform (NSCT) and block-based kernel Fisher linear discriminant (BKFLD). NSCT is a newly developed multiresolution analysis tool and has the ability to extract both intrinsic geometrical structure and directional information in images, which implies its discriminative potential for effective feature extraction of face images. By encoding the the NSCT coefficient images with the local binary pattern (LBP) operator, we could obtain a robust feature set. Furthermore, kernel Fisher linear discriminant is introduced to select the most discriminative feature sets, and the block-based scheme is incorporated to address the small sample size problem. Face recognition experiments on FERET database demonstrate the effectiveness of our proposed approach.

人脸表征包括特征提取和特征选择，是人脸识别系统成功的关键问题。本文提出了一种基于非下采样contourlet变换(NSCT)和基于分块的核Fisher线性判别(BKFLD)的人脸表示方法。NSCT是一种新开发的多分辨率分析工具，它能够同时提取图像的内在几何结构和方向信息，这意味着它具有有效提取人脸图像特征的判别潜力。利用局部二值模式(LBP)算子对NSCT系数图像进行编码，得到鲁棒特征集。在此基础上，引入核Fisher线性判别法来选择最具判别性的特征集，并结合基于分块的方法来解决小样本问题。在FERET数据库上的人脸识别实验证明了该方法的有效性。

引用次数: 15

Handling incomplete matrix data via continuous-valued infinite relational model 用连续值无限关系模型处理不完全矩阵数据

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288338

Tomohiko Suzuki, Takuma Nakamura, Yasutoshi Ida, Takashi Matsumoto

A continuous-valued infinite relational model is proposed as a solution to the co-clustering problem which arises in matrix data or tensor data calculations. The model is a probabilistic model utilizing the framework of Bayesian Nonparametrics which can estimate the number of components in posterior distributions. The original Infinite Relational Model cannot handle continuous-valued or multi-dimensional data directly. Our proposed model overcomes the data expression restrictions by utilizing the proposed likelihood, which can handle many types of data. The posterior distribution is estimated via variational inference. Using real-world data, we show that the proposed model outperforms the original model in terms of AUC score and efficiency for a movie recommendation task. (111 words).

针对矩阵数据和张量数据计算中出现的共聚类问题，提出了一种连续值无限关系模型。该模型是一种利用贝叶斯非参数框架的概率模型，可以估计后验分布中成分的数量。原来的无限关系模型不能直接处理连续值或多维数据。我们提出的模型利用提议的似然克服了数据表达的限制，可以处理多种类型的数据。后验分布通过变分推理估计。使用真实世界的数据，我们证明了所提出的模型在AUC分数和电影推荐任务的效率方面优于原始模型。(111字)。

引用次数: 0

A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training IVN声学模型训练中基于i向量的声学嗅探判别特征提取研究

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288814

Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo

Recently, we proposed an i-vector approach to acoustic sniffing for irrelevant variability normalization based acoustic model training in large vocabulary continuous speech recognition (LVCSR). Its effectiveness has been confirmed by experimental results on Switchboard- 1 conversational telephone speech transcription task. In this paper, we study several discriminative feature extraction approaches in i-vector space to improve both recognition accuracy and run-time efficiency. New experimental results are reported on a much larger scale LVCSR task with about 2000 hours training data.

最近，我们提出了一种基于i向量的声学嗅探方法，用于大词汇量连续语音识别(LVCSR)中基于不相关可变性归一化的声学模型训练。在Switchboard- 1对话式电话语音转录任务上的实验结果证实了该方法的有效性。本文研究了几种i向量空间的判别特征提取方法，以提高识别精度和运行效率。在更大规模的LVCSR任务上，用2000小时的训练数据报道了新的实验结果。

引用次数: 0

Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering 多通道语音去噪和分离与线性和非线性滤波的优化组合

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288809

M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga

In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speaker's transfer function is time-varying due to fluctuation of the corresponding speaker's head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.

在本文中，我们提出了一种多通道语音去噪分离技术，即使在有多个说话者并且每个说话者的传递函数由于相应的说话者的头部波动而时变的情况下，该技术仍然有效。为了增强对波动的鲁棒性，该方法基于概率混响传递函数模型PRTFM，从概率角度对线性滤波和非线性滤波同时进行优化。PRTFM是在不确定条件下对传统时不变传递函数模型的扩展，也可以看作是对最近提出的盲局部高斯模型的扩展。在参数优化过程中对线性滤波和非线性滤波进行了MMSE(最小均方误差)意义上的优化。在一个混响会议室中对该方法进行了评价，结果表明该方法是有效的。

引用次数: 10

Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation 基于先验信噪比估计的语音增强算法的权衡评估

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-03-25 DOI: 10.1109/ICASSP.2012.6288957

Pei Chee Yong, S. Nordholm, H. H. Dam

In this paper, a modified a priori SNR estimator is proposed for speech enhancement. The well-known decision-directed (DD) approach is modified by matching each gain function with the noisy speech spectrum at current frame rather than the previous one. The proposed algorithm eliminates the speech transient distortion and reduces the impact from the choice of the gain function towards the level of smoothing in the SNR estimate. An objective evaluation metric is employed to measure the trade-off between musical noise, noise reduction and speech distortion. Performance is evaluated and compared between a modified sigmoid gain function, the state-of-the-art log-spectral amplitude estimator and the Wiener filter. Simulation results show that the modified DD approach performs better in terms of the trade-off evaluation.

本文提出了一种改进的先验信噪比估计器用于语音增强。通过将每个增益函数与当前帧而不是前一帧的噪声语音频谱相匹配，改进了众所周知的决策导向(DD)方法。该算法消除了语音瞬态失真，降低了增益函数的选择对信噪比估计平滑程度的影响。采用客观的评价指标来衡量音乐噪声、降噪和语音失真之间的权衡。对改进的s型增益函数、最先进的对数谱幅度估计器和维纳滤波器的性能进行了评估和比较。仿真结果表明，改进的DD方法在权衡评估方面具有更好的性能。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀