首页 > 最新文献

1995 International Conference on Acoustics, Speech, and Signal Processing最新文献

英文 中文
Nonlinear recovery of sparse signals from narrowband data 窄带数据稀疏信号的非线性恢复
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.480462
R. Gopinath
This paper describes the connection between a certain signal recovery problem and the decoding of Reed-Solomon codes. It is shown that any algorithm for decoding Reed-Solomon codes (over finite fields) can be used to recover wide-band signals (over the real/complex field) from narrow-band information. It also shows that a signal with at most N/sub t/ frequency samples can be recovered from any contiguous band of 2N/sub t/ frequency samples.
本文描述了某一信号恢复问题与里德-所罗门码译码之间的联系。研究表明,任何解码里德-所罗门码(有限域)的算法都可以用于从窄带信息中恢复宽带信号(实/复域)。结果还表明,在任意2N/sub - t/频率采样的连续频带中,最多可以恢复出N/sub - t/频率采样的信号。
{"title":"Nonlinear recovery of sparse signals from narrowband data","authors":"R. Gopinath","doi":"10.1109/ICASSP.1995.480462","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.480462","url":null,"abstract":"This paper describes the connection between a certain signal recovery problem and the decoding of Reed-Solomon codes. It is shown that any algorithm for decoding Reed-Solomon codes (over finite fields) can be used to recover wide-band signals (over the real/complex field) from narrow-band information. It also shows that a signal with at most N/sub t/ frequency samples can be recovered from any contiguous band of 2N/sub t/ frequency samples.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132404828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Prediction of sound pressure fields by Picard-iterative BEM based on holographic interferometry 基于全息干涉法的picard -迭代边界元法声压场预测
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.480125
H. Klingele, H. Steinbichler
Holographic interferometry offers amplitude data with a high spatial resolution which can be used as vibration boundary condition for calculating the corresponding sound pressure field. When investigating objects with arbitrary 3D-shape this requires contour measuring, performing holographic interferometry for three axes of freedom, combining contour and vibration data into a boundary element (BE) model, and then solving the discretized Helmholtz-Kirchhoff integral equation for the surface sound pressure. The latter is done by means of the Picard-iterative boundary element method (PIBEM), which does not need matrix operations at all and such is capable of also treating large BE models arising from small bending wavelengths at high vibration frequencies. An experimental verification of this method by microphone measurements in an anechoic chamber is presented for a cylindrical object.
全息干涉测量提供了高空间分辨率的振幅数据,可作为计算相应声压场的振动边界条件。当研究具有任意三维形状的物体时,这需要轮廓测量,对三个自由轴进行全息干涉测量,将轮廓和振动数据结合到边界元(BE)模型中,然后求解表面声压的离散化Helmholtz-Kirchhoff积分方程。后者是通过picard -迭代边界元法(PIBEM)来完成的,该方法完全不需要矩阵运算,因此也能够处理高振动频率下由小弯曲波长产生的大型BE模型。通过在消声室中对圆柱形物体进行麦克风测量,对该方法进行了实验验证。
{"title":"Prediction of sound pressure fields by Picard-iterative BEM based on holographic interferometry","authors":"H. Klingele, H. Steinbichler","doi":"10.1109/ICASSP.1995.480125","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.480125","url":null,"abstract":"Holographic interferometry offers amplitude data with a high spatial resolution which can be used as vibration boundary condition for calculating the corresponding sound pressure field. When investigating objects with arbitrary 3D-shape this requires contour measuring, performing holographic interferometry for three axes of freedom, combining contour and vibration data into a boundary element (BE) model, and then solving the discretized Helmholtz-Kirchhoff integral equation for the surface sound pressure. The latter is done by means of the Picard-iterative boundary element method (PIBEM), which does not need matrix operations at all and such is capable of also treating large BE models arising from small bending wavelengths at high vibration frequencies. An experimental verification of this method by microphone measurements in an anechoic chamber is presented for a cylindrical object.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Noisy speech recognition using robust inversion of hidden Markov models 基于隐马尔可夫模型鲁棒反演的噪声语音识别
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479385
S. Moon, Jenq-Neng Hwang
The hidden Markov model (HMM) inversion algorithm is proposed and applied to robust speech recognition for general types of mismatched conditions. The Baum-Welch HMM inversion algorithm is a dual procedure to the Baum-Welch HMM reestimation algorithm, which is the most widely used speech recognition technique. The forward training of an HMM, based on the Baum-Welch reestimation, finds the model parameters /spl lambda/ that optimize some criterion, usually maximum likelihood (ML), with given speech inputs s. On the other hand, the inversion of a HMM finds speech inputs s that optimize some criterion with given model parameters /spl lambda/. The performance of the proposed HMM inversion, in conjunction with HMM reestimation, for robust speech recognition under additive noise corruption and microphone mismatch conditions is favorably compared with other noisy speech recognition techniques, such as the projection-based first-order cepstrum normalization (FOCN) and the robust minimax (MINIMAX) classification techniques.
提出了隐马尔可夫模型(HMM)反演算法,并将其应用于一般不匹配条件下的鲁棒语音识别。鲍姆-韦尔奇HMM反演算法是鲍姆-韦尔奇HMM重估计算法的双重过程,是目前应用最广泛的语音识别技术。HMM的前向训练,基于Baum-Welch重估计,用给定的语音输入s找到优化某些准则的模型参数/spl lambda/,通常是最大似然(ML)。另一方面,HMM的反演用给定的模型参数/spl lambda/找到优化某些准则的语音输入s。与基于投影的一阶倒谱归一化(FOCN)和鲁棒极大极小(minimax)分类技术等其他噪声语音识别技术相比,所提出的HMM反演与HMM重估计在加性噪声损坏和麦克风失配条件下的鲁棒语音识别性能优越。
{"title":"Noisy speech recognition using robust inversion of hidden Markov models","authors":"S. Moon, Jenq-Neng Hwang","doi":"10.1109/ICASSP.1995.479385","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479385","url":null,"abstract":"The hidden Markov model (HMM) inversion algorithm is proposed and applied to robust speech recognition for general types of mismatched conditions. The Baum-Welch HMM inversion algorithm is a dual procedure to the Baum-Welch HMM reestimation algorithm, which is the most widely used speech recognition technique. The forward training of an HMM, based on the Baum-Welch reestimation, finds the model parameters /spl lambda/ that optimize some criterion, usually maximum likelihood (ML), with given speech inputs s. On the other hand, the inversion of a HMM finds speech inputs s that optimize some criterion with given model parameters /spl lambda/. The performance of the proposed HMM inversion, in conjunction with HMM reestimation, for robust speech recognition under additive noise corruption and microphone mismatch conditions is favorably compared with other noisy speech recognition techniques, such as the projection-based first-order cepstrum normalization (FOCN) and the robust minimax (MINIMAX) classification techniques.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133188921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Discrete scale transform for signal analysis 离散尺度变换用于信号分析
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479859
E. J. Zalubas, W. J. Williams
The scale transform introduced by Cohen (see IEEE Trans. Signal Processing, vo1.41, p.3275-3292, December 1993) is a special case of the Mellin transform. The scale transform has mathematical properties desirable for comparison of signals for which scale variation occurs. In addition to the scale invariance property of the Mellin transform many properties specific to the scale transform have been presented. A procedure is presented for complete implementation of the scale transformation for discrete signals. This complements discrete Mellin transforms and delineates steps whose implementation are specific to the scale transform.
Cohen提出的尺度变换(参见IEEE Trans)。信号处理,vol .41, p.3275-3292, December 1993)是Mellin变换的一个特例。尺度变换具有比较发生尺度变化的信号所需的数学性质。除了Mellin变换的尺度不变性外,还提出了许多尺度变换特有的性质。给出了一个完整实现离散信号尺度变换的程序。这补充了离散Mellin变换,并描述了具体到尺度变换的实现步骤。
{"title":"Discrete scale transform for signal analysis","authors":"E. J. Zalubas, W. J. Williams","doi":"10.1109/ICASSP.1995.479859","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479859","url":null,"abstract":"The scale transform introduced by Cohen (see IEEE Trans. Signal Processing, vo1.41, p.3275-3292, December 1993) is a special case of the Mellin transform. The scale transform has mathematical properties desirable for comparison of signals for which scale variation occurs. In addition to the scale invariance property of the Mellin transform many properties specific to the scale transform have been presented. A procedure is presented for complete implementation of the scale transformation for discrete signals. This complements discrete Mellin transforms and delineates steps whose implementation are specific to the scale transform.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Supplementary orthogonal cepstral features 补充正交倒谱特征
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479609
K. Assaleh
A new set of LP-derived features is introduced. The concept of these features is motivated by the power sum formulation of the LP cepstrum. Due to the fact that the LP model implies that the resulting poles are either real or occur in complex conjugate pairs, the power sum of the poles is equivalent to the power sum of their real components. Therefore, the LP cepstrum is associated to the power sum of the real component of the LP poles. This fact is utilized in deriving a new set of features that is associated to the imaginary components of the LP poles. The author refers to this new set of features as the sepstral coefficients. It is found that the sepstral coefficients and cepstral coefficients are relatively uncorrelated. Hence, they can be used jointly to improve the performance of pattern classification applications where cepstral features are usually used. The author presents some preliminary results on speaker identification experiments.
引入了一组新的lp衍生特征。这些特征的概念是由LP倒谱的幂和公式激发的。由于LP模型意味着所得到的极点要么是实数,要么出现在复共轭对中,因此极点的幂和等于其实数分量的幂和。因此,低电压倒频谱与低电压极实分量的功率和有关。这一事实被用来推导一组新的特征,这些特征与低极的虚分量有关。作者将这组新的特征称为隔侧系数。研究发现,隔侧系数和倒侧系数相对不相关。因此,它们可以联合使用,以提高通常使用倒谱特征的模式分类应用程序的性能。作者介绍了说话人识别实验的一些初步结果。
{"title":"Supplementary orthogonal cepstral features","authors":"K. Assaleh","doi":"10.1109/ICASSP.1995.479609","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479609","url":null,"abstract":"A new set of LP-derived features is introduced. The concept of these features is motivated by the power sum formulation of the LP cepstrum. Due to the fact that the LP model implies that the resulting poles are either real or occur in complex conjugate pairs, the power sum of the poles is equivalent to the power sum of their real components. Therefore, the LP cepstrum is associated to the power sum of the real component of the LP poles. This fact is utilized in deriving a new set of features that is associated to the imaginary components of the LP poles. The author refers to this new set of features as the sepstral coefficients. It is found that the sepstral coefficients and cepstral coefficients are relatively uncorrelated. Hence, they can be used jointly to improve the performance of pattern classification applications where cepstral features are usually used. The author presents some preliminary results on speaker identification experiments.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Some results with a trainable speech translation and understanding system 一个可训练的语音翻译和理解系统的一些结果
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479286
Víctor M. Jiménez, A. Castellanos, E. Vidal
The problems of limited-domain spoken language translation and understanding are considered. A standard continuous speech recognizer is extended for using automatically learnt finite-state transducers as translation models. Understanding is considered as a particular case of translation where the target language is a formal language. From the different approaches compared, the best results are obtained with a fully integrated approach, in which the input language acoustic and lexical models, and (N-gram) language models of input and output languages, are embedded into the learnt transducers. Optimal search through this global network obtains the best translation for a given input acoustic signal.
讨论了有限域口语翻译和理解问题。将标准的连续语音识别器扩展为使用自动学习有限状态换能器作为翻译模型。理解被认为是翻译的一种特殊情况,目的语是一种形式语言。通过对不同方法的比较,一种完全集成的方法获得了最好的结果,该方法将输入语言的声学和词汇模型以及输入和输出语言的(N-gram)语言模型嵌入到学习到的换能器中。通过这个全局网络进行最优搜索,得到给定输入声信号的最佳平移。
{"title":"Some results with a trainable speech translation and understanding system","authors":"Víctor M. Jiménez, A. Castellanos, E. Vidal","doi":"10.1109/ICASSP.1995.479286","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479286","url":null,"abstract":"The problems of limited-domain spoken language translation and understanding are considered. A standard continuous speech recognizer is extended for using automatically learnt finite-state transducers as translation models. Understanding is considered as a particular case of translation where the target language is a formal language. From the different approaches compared, the best results are obtained with a fully integrated approach, in which the input language acoustic and lexical models, and (N-gram) language models of input and output languages, are embedded into the learnt transducers. Optimal search through this global network obtains the best translation for a given input acoustic signal.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Uniqueness study of measurements obtainable with an electromagnetic vector sensor 电磁矢量传感器测量结果的唯一性研究
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479928
Kah-Chye Tan, K. Ho, A. Nehorai
We investigate the linear dependence of the steering vectors of one electromagnetic vector sensor. We show that every 3 steering vectors with distinct DOAs are linearly independent. We also show that 4 steering vectors with distinct DOAs are linearly independent if the ellipticity angles of the signals associated with any 2 of the 4 steering vectors are distinct. We then establish that 5 steering vectors are linearly independent if exactly 2 or 3 of them correspond to circularly polarized signals with the same spin direction. Finally, we demonstrate that given any 5 steering vectors, then for any DOA there exists a steering vector which is linearly dependent on the 5 steering vectors.
研究了一种电磁矢量传感器的转向矢量之间的线性关系。我们证明了具有不同doa的每3个转向向量都是线性无关的。我们还表明,如果与4个转向矢量中的任意2个相关联的信号的椭圆度角不同,则具有不同doa的4个转向矢量是线性无关的。然后,我们建立了5个转向向量是线性无关的,如果恰好有2或3个转向向量对应于具有相同自旋方向的圆极化信号。最后,我们证明了给定任意5个导向向量,那么对于任意DOA存在一个与5个导向向量线性相关的导向向量。
{"title":"Uniqueness study of measurements obtainable with an electromagnetic vector sensor","authors":"Kah-Chye Tan, K. Ho, A. Nehorai","doi":"10.1109/ICASSP.1995.479928","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479928","url":null,"abstract":"We investigate the linear dependence of the steering vectors of one electromagnetic vector sensor. We show that every 3 steering vectors with distinct DOAs are linearly independent. We also show that 4 steering vectors with distinct DOAs are linearly independent if the ellipticity angles of the signals associated with any 2 of the 4 steering vectors are distinct. We then establish that 5 steering vectors are linearly independent if exactly 2 or 3 of them correspond to circularly polarized signals with the same spin direction. Finally, we demonstrate that given any 5 steering vectors, then for any DOA there exists a steering vector which is linearly dependent on the 5 steering vectors.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127863444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the choice of wavelet filters for audio compression 小波滤波器在音频压缩中的选择
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.480413
P. Philippe, F. M. D. Saint-Martin, L. Mainard
We address the issue of choosing an optimal wavelet packets transform for audio compression. We present a comparison method based on a perceptual approach, which provides an entropic bit-rate for "transparent" coding of a given audio signal. The test with different wavelets leads to the conclusion that the most significant synthesis criterion for audio compression is the so-called "coding gain", while frequency selectivity, regularity and orthogonality seem less relevant.
我们解决了选择最优小波包变换用于音频压缩的问题。我们提出了一种基于感知方法的比较方法,该方法为给定音频信号的“透明”编码提供了熵比特率。不同小波的测试得出结论,音频压缩最重要的合成标准是所谓的“编码增益”,而频率选择性、规则性和正交性似乎不太相关。
{"title":"On the choice of wavelet filters for audio compression","authors":"P. Philippe, F. M. D. Saint-Martin, L. Mainard","doi":"10.1109/ICASSP.1995.480413","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.480413","url":null,"abstract":"We address the issue of choosing an optimal wavelet packets transform for audio compression. We present a comparison method based on a perceptual approach, which provides an entropic bit-rate for \"transparent\" coding of a given audio signal. The test with different wavelets leads to the conclusion that the most significant synthesis criterion for audio compression is the so-called \"coding gain\", while frequency selectivity, regularity and orthogonality seem less relevant.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133799389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Analysis of acoustic-phonetic variations in fluent speech using TIMIT 用TIMIT分析流利言语的声音变化
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479399
Don X. Sun, L. Deng
We propose a hierarchically structured analysis of variance (ANOVA) method to analyze, in a quantitative manner, the contributions of various identifiable factors to the overall acoustic variability exhibited in fluent speech data of TIMIT processed in the form of mel-frequency cepstral coefficients. The results of the analysis show that the greatest acoustic variability in TIMIT data is explained by the difference among distinct phonetic labels in TIMIT, followed by the phonetic context difference given a fixed phonetic label. The variability among sequential sub-segments within each TIMIT-defined phonetic segment is found to be significantly greater than the gender, dialect region, and speaker factors. Our results serve to provide useful insights to the understanding of the roles of various components of speech recognizers in contributing to the ultimate speech recognition performance.
我们提出了一种层次结构的方差分析(ANOVA)方法,以定量的方式分析各种可识别因素对以梅尔频率倒谱系数形式处理的TIMIT流畅语音数据中所表现出的整体声学变异性的贡献。分析结果表明,TIMIT数据中最大的声学变异性是由TIMIT中不同语音标签之间的差异造成的,其次是固定语音标签下的语音语境差异。在每个由timit定义的语音段中,顺序子段之间的变异性明显大于性别、方言区域和说话人因素。我们的研究结果为理解语音识别器的各个组成部分在最终语音识别性能中的作用提供了有用的见解。
{"title":"Analysis of acoustic-phonetic variations in fluent speech using TIMIT","authors":"Don X. Sun, L. Deng","doi":"10.1109/ICASSP.1995.479399","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479399","url":null,"abstract":"We propose a hierarchically structured analysis of variance (ANOVA) method to analyze, in a quantitative manner, the contributions of various identifiable factors to the overall acoustic variability exhibited in fluent speech data of TIMIT processed in the form of mel-frequency cepstral coefficients. The results of the analysis show that the greatest acoustic variability in TIMIT data is explained by the difference among distinct phonetic labels in TIMIT, followed by the phonetic context difference given a fixed phonetic label. The variability among sequential sub-segments within each TIMIT-defined phonetic segment is found to be significantly greater than the gender, dialect region, and speaker factors. Our results serve to provide useful insights to the understanding of the roles of various components of speech recognizers in contributing to the ultimate speech recognition performance.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133824018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Co-channel speaker separation 同道扬声器分离
Pub Date : 1995-05-09 DOI: 10.1109/ICASSP.1995.479822
D. Morgan, E. George, L. Lee, Stephen M. Kay
This paper describes a system for the automatic separation of two-talker co-channel speech. This system is based on a frame-by-frame speaker separation algorithm that exploits a pitch estimate of the stronger talker derived from the co-channel signal. The concept underlying this approach is to recover the stronger talker's speech by enhancing harmonic frequencies and formants given a multi-resolution pitch estimate. The weaker talker's speech is obtained from the residual signal created when the harmonics and formants of the stronger talker are suppressed. A maximum likelihood speaker assignment algorithm is used to place the recovered frames from the target and interfering talkers in separate channels. The system has been tested at target-to-interferer ratios (TIRs) from -18 to 18 dB with human listening tests, and with machine-based tests employing a keyword spotting system on the Switchboard Corpus for target talkers at 6, 12, and 18 dB TIR.
本文介绍了一种双对讲机同信道语音自动分离系统。该系统基于逐帧说话人分离算法,该算法利用来自同信道信号的强说话人的基音估计。这种方法的基本概念是通过增强给定多分辨率音高估计的谐波频率和共振峰来恢复更强的说话者的讲话。弱说话者的语音是从强说话者的谐波和共振峰被抑制后产生的残余信号中获得的。采用最大似然说话人分配算法,将目标说话人和干扰说话人的恢复帧分别置于不同的信道中。该系统已在目标干扰比(TIR)从-18到18 dB的人类听力测试中进行了测试,并在交换机语料库上使用关键字识别系统对目标说话者进行了6,12和18 dB TIR的基于机器的测试。
{"title":"Co-channel speaker separation","authors":"D. Morgan, E. George, L. Lee, Stephen M. Kay","doi":"10.1109/ICASSP.1995.479822","DOIUrl":"https://doi.org/10.1109/ICASSP.1995.479822","url":null,"abstract":"This paper describes a system for the automatic separation of two-talker co-channel speech. This system is based on a frame-by-frame speaker separation algorithm that exploits a pitch estimate of the stronger talker derived from the co-channel signal. The concept underlying this approach is to recover the stronger talker's speech by enhancing harmonic frequencies and formants given a multi-resolution pitch estimate. The weaker talker's speech is obtained from the residual signal created when the harmonics and formants of the stronger talker are suppressed. A maximum likelihood speaker assignment algorithm is used to place the recovered frames from the target and interfering talkers in separate channels. The system has been tested at target-to-interferer ratios (TIRs) from -18 to 18 dB with human listening tests, and with machine-based tests employing a keyword spotting system on the Switchboard Corpus for target talkers at 6, 12, and 18 dB TIR.","PeriodicalId":300119,"journal":{"name":"1995 International Conference on Acoustics, Speech, and Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
1995 International Conference on Acoustics, Speech, and Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1