首页 > 最新文献

Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)最新文献

英文 中文
Isotropic noise modelling for nearfield array processing 近场阵列处理的各向同性噪声建模
T. Abhayapala, R. Kennedy, R. Williamson
An exact series representation for a nearfield spherically isotropic noise model is introduced. The methodology uses the spherical harmonics expansion of the wavefield at a sensor to obtain the correlation between two sensors due to the nearfield isotropic noise field. The result is useful in nearfield application of sensor arrays. The proposed noise model can be utilized effectively to apply well established farfield array processing algorithms for nearfield applications. Specifically, any signal processing criterion based on farfield isotropic noise correlation can be reformulated with nearfield noise with this representation. A simple array gain optimization is used to demonstrate the new noise model.
介绍了近场球面各向同性噪声模型的精确级数表示。该方法利用传感器处波场的球面谐波展开,得到两个传感器之间由于近场各向同性噪声场而产生的相关性。该结果可用于传感器阵列的近场应用。所提出的噪声模型可以有效地用于近场应用的远场阵列处理算法。具体地说,任何基于远场各向同性噪声相关的信号处理准则都可以用这种表示形式重新表述为近场噪声。用一个简单的阵列增益优化来演示新的噪声模型。
{"title":"Isotropic noise modelling for nearfield array processing","authors":"T. Abhayapala, R. Kennedy, R. Williamson","doi":"10.1109/ASPAA.1999.810837","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810837","url":null,"abstract":"An exact series representation for a nearfield spherically isotropic noise model is introduced. The methodology uses the spherical harmonics expansion of the wavefield at a sensor to obtain the correlation between two sensors due to the nearfield isotropic noise field. The result is useful in nearfield application of sensor arrays. The proposed noise model can be utilized effectively to apply well established farfield array processing algorithms for nearfield applications. Specifically, any signal processing criterion based on farfield isotropic noise correlation can be reformulated with nearfield noise with this representation. A simple array gain optimization is used to demonstrate the new noise model.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
New phase-vocoder techniques for pitch-shifting, harmonizing and other exotic effects 新的相位声码器技术,用于音高移动,和声和其他奇异效果
Jean Laroche, M. Dolson
The phase-vocoder is usually presented as a high-quality solution for time-scale modification of signals, pitch-scale modifications usually being implemented as a combination of timescaling and sampling rate conversion. We present two new phase-vocoder-based techniques which allow direct manipulation of the signal in the frequency-domain, enabling such applications as pitch-shifting, chorusing, harmonizing, partial stretching and other exotic modifications which cannot be achieved by the standard time-scale sampling-rate conversion scheme. The new techniques are based on a very simple peak-detection stage, followed by a peak-shifting stage. The very simplest one allows for 50% overlap but restricts the precision of the modifications, while the most flexible techniques requires a more expensive 75% overlap.
相位声码器通常作为信号时间尺度修改的高质量解决方案,音高尺度修改通常作为时间尺度和采样率转换的组合来实现。我们提出了两种新的基于相位声编码器的技术,它们允许在频域直接操纵信号,从而实现诸如音高移动,合唱,协调,部分拉伸和其他通过标准时间尺度采样率转换方案无法实现的奇异修改。新技术是基于一个非常简单的峰检测阶段,然后是一个移峰阶段。最简单的技术允许50%的重叠,但限制了修改的精度,而最灵活的技术需要更昂贵的75%的重叠。
{"title":"New phase-vocoder techniques for pitch-shifting, harmonizing and other exotic effects","authors":"Jean Laroche, M. Dolson","doi":"10.1109/ASPAA.1999.810857","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810857","url":null,"abstract":"The phase-vocoder is usually presented as a high-quality solution for time-scale modification of signals, pitch-scale modifications usually being implemented as a combination of timescaling and sampling rate conversion. We present two new phase-vocoder-based techniques which allow direct manipulation of the signal in the frequency-domain, enabling such applications as pitch-shifting, chorusing, harmonizing, partial stretching and other exotic modifications which cannot be achieved by the standard time-scale sampling-rate conversion scheme. The new techniques are based on a very simple peak-detection stage, followed by a peak-shifting stage. The very simplest one allows for 50% overlap but restricts the precision of the modifications, while the most flexible techniques requires a more expensive 75% overlap.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Studies of a wideband stereophonic acoustic echo canceler 宽频带立体声回声消除器的研究
P. Eneroth, T. Gansler, S. Gay, J. Benesty
In this paper a wideband stereophonic acoustic echo canceler is presented. The fundamental difficulty of stereophonic acoustic echo cancellation (SAEC) is described and an echo canceler based on a fast recursive least squares algorithm in a subband structure is proposed. This structure have been used in a real-time implementation, on which experiments have been performed. In the paper, simulation results of this implementation on real life recordings, with 8 kHz bandwidth, are studied. The results clearly verify that the theoretic fundamental problem of SAEC also applies in real-life situations. They also show that more sophisticated adaptive algorithms are needed in the lower frequency regions than in the higher regions.
本文提出了一种宽带立体声回波消除器。阐述了立体声回声消除的基本难点,提出了一种基于子带结构的快速递归最小二乘算法的回声消除器。该结构已用于实时实现,并进行了实验。在本文中,研究了该实现在8 kHz带宽的真实录音上的仿真结果。结果清楚地验证了SAEC的理论基本问题也适用于实际情况。他们还表明,在低频区域比在高频区域需要更复杂的自适应算法。
{"title":"Studies of a wideband stereophonic acoustic echo canceler","authors":"P. Eneroth, T. Gansler, S. Gay, J. Benesty","doi":"10.1109/ASPAA.1999.810886","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810886","url":null,"abstract":"In this paper a wideband stereophonic acoustic echo canceler is presented. The fundamental difficulty of stereophonic acoustic echo cancellation (SAEC) is described and an echo canceler based on a fast recursive least squares algorithm in a subband structure is proposed. This structure have been used in a real-time implementation, on which experiments have been performed. In the paper, simulation results of this implementation on real life recordings, with 8 kHz bandwidth, are studied. The results clearly verify that the theoretic fundamental problem of SAEC also applies in real-life situations. They also show that more sophisticated adaptive algorithms are needed in the lower frequency regions than in the higher regions.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130094008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Application of the phase vocoder to pitch-preserving synchronization of an audio stream to an external clock 相位声码器在音频流与外部时钟保持音高同步中的应用
R. Sussman, J. Laroche
The phase vocoder is usually presented as a high-quality solution for time-scale modification of signals, Its main advantages versus the cheaper time-domain techniques include the high-quality of the output for a wide range of types of input signals (speech, music, noise), and the possibility to perform very large factor modifications (e.g., four-fold time-stretching or more). In this paper, we present two applications that require such extreme modification factors: we call the first one pitch-preserving audio scrubbing, in which a user can move a pointer along an audio track and hear the sound at the corresponding location without any pitch alteration. Because the user controls the playback location (and therefore the playback speed), and can very well stop at a given location, the required time-scale modification can involve a very large-factor. The second application consists of synchronizing an audio stream to a video stream, while avoiding pitch alteration. For extreme slow-motion playback, the time-scaling operation required to preserve the pitch can also involve a very large factor. We address theoretical and practical issues related to pitch-preserving synchronization of an audio track. Techniques are discussed to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.
相位声码器通常被认为是信号时间尺度修改的高质量解决方案,与便宜的时域技术相比,它的主要优点包括对各种类型的输入信号(语音、音乐、噪声)的高质量输出,以及执行非常大的因子修改(例如,四倍的时间拉伸或更多)的可能性。在本文中,我们提出了两种需要这种极端修改因素的应用:我们称之为第一个保持音高的音频擦洗,其中用户可以沿着音轨移动指针并在相应位置听到声音,而不需要任何音高改变。由于用户控制播放位置(因此也控制播放速度),并且可以很好地在给定位置停止,因此所需的时间尺度修改可能涉及非常大的因素。第二个应用程序包括同步音频流到视频流,同时避免音高改变。对于极端的慢动作回放,保留音高所需的时间缩放操作也可能涉及一个非常大的因素。我们讨论了与音轨保持音高同步相关的理论和实践问题。讨论了在相位声码器中允许冻结时间的技术,并避免了与非常大的因子修改相关的问题。
{"title":"Application of the phase vocoder to pitch-preserving synchronization of an audio stream to an external clock","authors":"R. Sussman, J. Laroche","doi":"10.1109/ASPAA.1999.810853","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810853","url":null,"abstract":"The phase vocoder is usually presented as a high-quality solution for time-scale modification of signals, Its main advantages versus the cheaper time-domain techniques include the high-quality of the output for a wide range of types of input signals (speech, music, noise), and the possibility to perform very large factor modifications (e.g., four-fold time-stretching or more). In this paper, we present two applications that require such extreme modification factors: we call the first one pitch-preserving audio scrubbing, in which a user can move a pointer along an audio track and hear the sound at the corresponding location without any pitch alteration. Because the user controls the playback location (and therefore the playback speed), and can very well stop at a given location, the required time-scale modification can involve a very large-factor. The second application consists of synchronizing an audio stream to a video stream, while avoiding pitch alteration. For extreme slow-motion playback, the time-scaling operation required to preserve the pitch can also involve a very large factor. We address theoretical and practical issues related to pitch-preserving synchronization of an audio track. Techniques are discussed to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On some derivations of Gibson's approach for speech enhancement 吉布森语音增强方法的一些衍生
É. Grivel, M. Gabrea, M. Najim
This paper deals with a Kalman filter-based enhancement of a speech signal embedded in a colored noise, when using a single microphone system. Several approaches using Kalman filtering have been developed. More particularly, Gibson et al. (1991) reported an iterative method based on the so called "noise-free" state space model, which may imply the introduction of a coordinate transformation to perform Kalman filtering. The authors do not address the identification issue. We propose some derivations of this method through an identification step using subspace methods for identification, previously developed in the field of control by Van Overschee (1993). The methods proposed here are then compared with other Kalman based-approaches.
本文研究了在单麦克风系统中,基于卡尔曼滤波的彩色噪声语音信号增强方法。已经开发了几种使用卡尔曼滤波的方法。更具体地说,Gibson等人(1991)报道了一种基于所谓“无噪声”状态空间模型的迭代方法,这可能意味着引入坐标变换来执行卡尔曼滤波。作者没有解决识别问题。我们通过使用子空间方法进行识别的识别步骤提出了该方法的一些衍生,该方法先前由Van Overschee(1993)在控制领域开发。然后将本文提出的方法与其他基于卡尔曼的方法进行了比较。
{"title":"On some derivations of Gibson's approach for speech enhancement","authors":"É. Grivel, M. Gabrea, M. Najim","doi":"10.1109/ASPAA.1999.810868","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810868","url":null,"abstract":"This paper deals with a Kalman filter-based enhancement of a speech signal embedded in a colored noise, when using a single microphone system. Several approaches using Kalman filtering have been developed. More particularly, Gibson et al. (1991) reported an iterative method based on the so called \"noise-free\" state space model, which may imply the introduction of a coordinate transformation to perform Kalman filtering. The authors do not address the identification issue. We propose some derivations of this method through an identification step using subspace methods for identification, previously developed in the field of control by Van Overschee (1993). The methods proposed here are then compared with other Kalman based-approaches.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132358238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robustness analysis of 3D audio using loudspeakers 使用扬声器的3D音频鲁棒性分析
D. Ward, G. Elko
It is well known that the effectiveness of 3D audio systems is critically dependent on the listener's head being in a known location. In this paper we analyze the fundamental role played by the loudspeaker positions in determining the robustness of the crosstalk canceler. Based on an extremely simple head model, we derive straightforward expressions for the loudspeaker positions that optimize the system robustness, which is measured by matrix condition numbers. These derived optimum positions are then compared with empirically-derived optimum positions obtained from actual HRTF (head related transfer function) measurements. The results indicate that our analytical expressions accurately predict the optimum loudspeaker positions.
众所周知,3D音频系统的有效性严重依赖于听者的头部处于已知位置。在本文中,我们分析了扬声器的位置在决定串音消除器的鲁棒性方面所起的基本作用。基于一个极其简单的头部模型,我们推导了扬声器位置优化系统鲁棒性的直接表达式,这是由矩阵条件数测量的。然后将这些导出的最佳位置与从实际HRTF(头部相关传递函数)测量中获得的经验导出的最佳位置进行比较。结果表明,我们的解析式能准确地预测扬声器的最佳位置。
{"title":"A robustness analysis of 3D audio using loudspeakers","authors":"D. Ward, G. Elko","doi":"10.1109/ASPAA.1999.810882","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810882","url":null,"abstract":"It is well known that the effectiveness of 3D audio systems is critically dependent on the listener's head being in a known location. In this paper we analyze the fundamental role played by the loudspeaker positions in determining the robustness of the crosstalk canceler. Based on an extremely simple head model, we derive straightforward expressions for the loudspeaker positions that optimize the system robustness, which is measured by matrix condition numbers. These derived optimum positions are then compared with empirically-derived optimum positions obtained from actual HRTF (head related transfer function) measurements. The results indicate that our analytical expressions accurately predict the optimum loudspeaker positions.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multifeature audio segmentation for browsing and annotation 多功能音频分割浏览和注释
G. Tzanetakis, P. Cook
Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.
索引和基于内容的检索对于处理在Web和其他地方可用的大量音频和多媒体数据是必要的。由于使用现有音频编辑器进行手动索引非常耗时,因此提出了许多自动内容分析系统。这些系统大多依靠语音识别技术来创建文本索引。另一方面,很少有人提出对音乐和一般音频进行自动索引的系统。通常,这些系统依赖于分类和相似检索技术,并在有限的音频域中工作。对任意音频数据进行快速索引的一种稍微不同、更通用的方法是使用基于多个时间特征的分割,并结合自动或半自动注释。本文提出了一种通用的音频分割方法。进行了大量的实验来评估所提出的方法,并比较不同的分割方案。最后,结合现有的分类技术,实现了基于分割的音频浏览标注工具原型。
{"title":"Multifeature audio segmentation for browsing and annotation","authors":"G. Tzanetakis, P. Cook","doi":"10.1109/ASPAA.1999.810860","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810860","url":null,"abstract":"Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the Web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat different, more general approach for fast indexing of arbitrary audio data is the use of segmentation based on multiple temporal features combined with automatic or semi-automatic annotation. In this paper, a general methodology for audio segmentation is proposed. A number of experiments were performed to evaluate the proposed methodology and compare different segmentation schemes. Finally, a prototype audio browsing and annotation tool based on segmentation combined with existing classification techniques was implemented.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116532026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
A systematic hybrid analog/digital audio coder 一种系统混合模拟/数字音频编码器
R. Barron, A. Oppenheim
This paper describes a signal coding solution for a hybrid channel that is the composition of two channels: a noisy analog channel through which a signal source is sent unprocessed and a secondary rate-constrained digital channel. The source is processed prior to transmission through the digital channel. Signal coding solutions for this hybrid channel are clearly applicable to the in-band on-channel (IBOC) digital audio broadcast (DAB) problem. We present the design of a perceptually-based subband audio coder, with complexity comparable to conventional coders, that exploits a signal at the receiver of the form y[n]=g[n]*x[n]+u[n], where x[n], g[n], and u[n] denote respectively the source, the impulse response of convolutional distortion, and additive Gaussian noise. Concepts from conventional subband coding, e.g. subband decomposition, quantization, bit allocation, and lossless signal coding, are tailored to exploit the analog signal at the receiver such that frequency-weighted mean-squared error is minimized.
本文描述了一种混合信道的信号编码解决方案,混合信道由两个信道组成:一个是未经处理的信号源的噪声模拟信道,另一个是受速率限制的二级数字信道。在通过数字信道传输之前对源进行处理。这种混合信道的信号编码解决方案显然适用于带内信道(IBOC)数字音频广播(DAB)问题。我们设计了一种基于感知的子带音频编码器,其复杂性与传统编码器相当,该编码器利用形式为y[n]=g[n]*x[n]+u[n]的接收器信号,其中x[n], g[n]和u[n]分别表示源,卷积失真的脉冲响应和加性高斯噪声。传统子带编码的概念,如子带分解、量化、位分配和无损信号编码,都是为了利用接收器上的模拟信号而量身定制的,从而使频率加权均方误差最小化。
{"title":"A systematic hybrid analog/digital audio coder","authors":"R. Barron, A. Oppenheim","doi":"10.1109/ASPAA.1999.810843","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810843","url":null,"abstract":"This paper describes a signal coding solution for a hybrid channel that is the composition of two channels: a noisy analog channel through which a signal source is sent unprocessed and a secondary rate-constrained digital channel. The source is processed prior to transmission through the digital channel. Signal coding solutions for this hybrid channel are clearly applicable to the in-band on-channel (IBOC) digital audio broadcast (DAB) problem. We present the design of a perceptually-based subband audio coder, with complexity comparable to conventional coders, that exploits a signal at the receiver of the form y[n]=g[n]*x[n]+u[n], where x[n], g[n], and u[n] denote respectively the source, the impulse response of convolutional distortion, and additive Gaussian noise. Concepts from conventional subband coding, e.g. subband decomposition, quantization, bit allocation, and lossless signal coding, are tailored to exploit the analog signal at the receiver such that frequency-weighted mean-squared error is minimized.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121938635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum 一种基于Bark频率翘曲的语音特征-非均匀线性预测倒谱
Yoon Kim, J.O. Smith
We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.
本文提出了一种从语音信号中获取特征用于鲁棒分析和识别的新方法——非均匀线性预测倒谱。我们的目标是推导出一种表示,它可以抑制说话人依赖的特征,同时保持语音片段的语言质量。这种分析基于两个原则。首先,对低频频谱进行吠叫频率翘曲以模拟听觉频谱。虽然广泛使用的方法(如mel-frequency和PLP分析)使用FFT频谱作为其翘曲的基础,但NLP分析使用基于lp的声道频谱,去除声门效应。其次,在翘曲前后使用全极建模(LP)。曲前LP首先用于获得声道频谱,而曲后LP用于获得曲后频谱的平滑双峰模型。实验验证了该特征在线性判别分析(LDA)和基于帧的统计模型元音识别中的有效性。在这两种情况下,NLP分析被证明是独立于说话者的语音分析/识别应用的有效工具。
{"title":"A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum","authors":"Yoon Kim, J.O. Smith","doi":"10.1109/ASPAA.1999.810867","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810867","url":null,"abstract":"We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"57 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Improvements to the switched parametric and transform audio coder 改进的开关参数和变换音频编码器
S. Levine, J.O. Smith
We introduce improvements to previous sines+transients+noise audio modeling systems, including new sinusoidal trajectory selection and quantization procedures. In a previous work by Levine and Smith (see Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Phoenix, 1999), the audio is first segmented into transient and non-transient regions. The transient region is modeled using traditional transform coding techniques, while the non-transient regions are modeled using parametric sines plus noise modeling. Because such a system contains a mix of parametric and non-parametric techniques, compressed-domain processing such as time-scale modifications are possible.
我们介绍了对以前的正弦+瞬态+噪声音频建模系统的改进,包括新的正弦轨迹选择和量化程序。在Levine和Smith之前的工作中(见Proc. Int)。Conf.声学,语音和信号处理,凤凰,1999),音频首先被分割成瞬态和非瞬态区域。暂态区域采用传统的变换编码技术建模,非暂态区域采用参数正弦加噪声建模。因为这样的系统包含参数和非参数技术的混合,压缩域处理,如时间尺度修改是可能的。
{"title":"Improvements to the switched parametric and transform audio coder","authors":"S. Levine, J.O. Smith","doi":"10.1109/ASPAA.1999.810845","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810845","url":null,"abstract":"We introduce improvements to previous sines+transients+noise audio modeling systems, including new sinusoidal trajectory selection and quantization procedures. In a previous work by Levine and Smith (see Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Phoenix, 1999), the audio is first segmented into transient and non-transient regions. The transient region is modeled using traditional transform coding techniques, while the non-transient regions are modeled using parametric sines plus noise modeling. Because such a system contains a mix of parametric and non-parametric techniques, compressed-domain processing such as time-scale modifications are possible.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129187925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1