Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169582
J. Fuchs
For a random process that can be modeled as a sum of real sinusoids in white noise, we address the problem of the estimation of the number of sinusoids. The test we propose uses the eigen-decomposition of the estimated autocorrelation matrix and is based on matrix perturbation analysis. The estimator is shown to resolve closely spaced sinusoids at quite low signal -to- noise ratios.
{"title":"Estimating the number of sinusoids in additive white-noise","authors":"J. Fuchs","doi":"10.1109/ICASSP.1987.1169582","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169582","url":null,"abstract":"For a random process that can be modeled as a sum of real sinusoids in white noise, we address the problem of the estimation of the number of sinusoids. The test we propose uses the eigen-decomposition of the estimated autocorrelation matrix and is based on matrix perturbation analysis. The estimator is shown to resolve closely spaced sinusoids at quite low signal -to- noise ratios.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122362172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169850
S. Prasad, Ronald T. Williams, Arijit K. Mahalanabis, L. Sibul
In recent years a new, and very powerful technique for parameter estimation - the eigenstructure, or signal subspace method - has been developed. Eigenstructure algorithms are closely related to Pisarenko's method for estimating the frequencies of sinusoids in white Gaussian noise. In theory they yield asymptotically unbiased estimates of arbitrarily close parameters, independent of the signal-to-noise ratio (SNR). Although signal subspace methods have proven to be powerful tools, they are not without drawbacks. An important weakness of all signal subspace algorithmis their need to know the noise covariance explicitly. The important problem of developing signal subspace based procedures for signals in noise fields with unknown covariance has not been satisfactorily addressed. It is our intent to propose a solution to the problem of direction-of-arrival (DOA) estimation for a broad class of unknown noise fields. We will then briefly discuss other important estimation problems for which modified versions of this procedure can be applied.
{"title":"A transform based covariance differencing approach to bearing estimation","authors":"S. Prasad, Ronald T. Williams, Arijit K. Mahalanabis, L. Sibul","doi":"10.1109/ICASSP.1987.1169850","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169850","url":null,"abstract":"In recent years a new, and very powerful technique for parameter estimation - the eigenstructure, or signal subspace method - has been developed. Eigenstructure algorithms are closely related to Pisarenko's method for estimating the frequencies of sinusoids in white Gaussian noise. In theory they yield asymptotically unbiased estimates of arbitrarily close parameters, independent of the signal-to-noise ratio (SNR). Although signal subspace methods have proven to be powerful tools, they are not without drawbacks. An important weakness of all signal subspace algorithmis their need to know the noise covariance explicitly. The important problem of developing signal subspace based procedures for signals in noise fields with unknown covariance has not been satisfactorily addressed. It is our intent to propose a solution to the problem of direction-of-arrival (DOA) estimation for a broad class of unknown noise fields. We will then briefly discuss other important estimation problems for which modified versions of this procedure can be applied.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169359
Y. Matsuyama
Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.
{"title":"Variable region vector quantization, space warping and speech/Image compression","authors":"Y. Matsuyama","doi":"10.1109/ICASSP.1987.1169359","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169359","url":null,"abstract":"Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126691937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169731
B. Mérialdo
This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps:bulleta Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.bulletfrom this list, a Word Match procedure uses the dictionary to build partial word hypothesis.bulletthen a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested,bulletone composed of the 10,000 most frequent words,bulletthe other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.
{"title":"Speech recognition with very large size dictionary","authors":"B. Mérialdo","doi":"10.1109/ICASSP.1987.1169731","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169731","url":null,"abstract":"This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps:bulleta Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.bulletfrom this list, a Word Match procedure uses the dictionary to build partial word hypothesis.bulletthen a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested,bulletone composed of the 10,000 most frequent words,bulletthe other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169360
Y. Shoham
Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.
{"title":"Vector predictive quantization of the spectral parameters for low rate speech coding","authors":"Y. Shoham","doi":"10.1109/ICASSP.1987.1169360","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169360","url":null,"abstract":"Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131170554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169899
F. Soong, M. Sondhi
The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.
{"title":"A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise","authors":"F. Soong, M. Sondhi","doi":"10.1109/ICASSP.1987.1169899","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169899","url":null,"abstract":"The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a \"bandwidth broadened\" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169748
Y. Chow, M. O. Dunham, O. Kimball, M. Krasner, G. Kubala, J. Makhoul, P. Price, Salim Roukos, R. Schwartz
In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.
{"title":"BYBLOS: The BBN continuous speech recognition system","authors":"Y. Chow, M. O. Dunham, O. Kimball, M. Krasner, G. Kubala, J. Makhoul, P. Price, Salim Roukos, R. Schwartz","doi":"10.1109/ICASSP.1987.1169748","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169748","url":null,"abstract":"In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114854489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169666
A. Kundu
In this paper, a new algorithm for visual shape coding and representation using a powerful theorem on algebraic curve has been presented. The algorithm codes the closed shape, which does never intersect itself, by means of the foci of the closed curve and a distance parameter 'p' such that the product of distances of any point on the shape from the foci is approximately constant and is equal to 'p'. The computation of the foci co-ordinates and 'p' parameter has been presented as a solution to a linearized least-square problem. The reconstruction algorithm is based on straightforward implementation of the theorem. Some experimental results have also been provided indicating the success of the algorithm.
{"title":"Lemniscate transform: A new efficient technique for shape coding and representation","authors":"A. Kundu","doi":"10.1109/ICASSP.1987.1169666","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169666","url":null,"abstract":"In this paper, a new algorithm for visual shape coding and representation using a powerful theorem on algebraic curve has been presented. The algorithm codes the closed shape, which does never intersect itself, by means of the foci of the closed curve and a distance parameter 'p' such that the product of distances of any point on the shape from the foci is approximately constant and is equal to 'p'. The computation of the foci co-ordinates and 'p' parameter has been presented as a solution to a linearized least-square problem. The reconstruction algorithm is based on straightforward implementation of the theorem. Some experimental results have also been provided indicating the success of the algorithm.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115805081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169635
M. Beddoes, L. Panych, Juan Qian, J. Wada
The role of the parametric stage is studied under various conditions and the following points have been demonstrated: -With high sampling rate (200 Hz) but otherwise favourable conditions, as the filter order, p, is increased from zero to nineteen the signal-to-noise ratio at the output of the parametric stage remains the same as at the input. Under less favourable conditions, it can fall as p is increased. -We find that comparable performance (in terms of spikes detected) is obtained when the parametric stage is omitted entirely, and detection is based only on the very simple non-parametric stage.
{"title":"A criticism of the parametric EEG spike detector","authors":"M. Beddoes, L. Panych, Juan Qian, J. Wada","doi":"10.1109/ICASSP.1987.1169635","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169635","url":null,"abstract":"The role of the parametric stage is studied under various conditions and the following points have been demonstrated: -With high sampling rate (200 Hz) but otherwise favourable conditions, as the filter order, p, is increased from zero to nineteen the signal-to-noise ratio at the output of the parametric stage remains the same as at the input. Under less favourable conditions, it can fall as p is increased. -We find that comparable performance (in terms of spikes detected) is obtained when the parametric stage is omitted entirely, and detection is based only on the very simple non-parametric stage.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169544
R. Lippmann, E. A. Martin, D. Paul
A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.
{"title":"Multi-style training for robust isolated-word speech recognition","authors":"R. Lippmann, E. A. Martin, D. Paul","doi":"10.1109/ICASSP.1987.1169544","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169544","url":null,"abstract":"A new training procedure called multi-style training has been developed to improve performance when a recognizer is used under stress or in high noise but cannot be trained in these conditions. Instead of speaking normally during training, talkers use different, easily produced, talking styles. This technique was tested using a speech data base that included stress speech produced during a workload task and when intense noise was presented through earphones. A continuous-distribution talker-dependent Hidden Markov Model (HMM) recognizer was trained both normally (5 normally spoken tokens) and with multi-style training (one token each from normal, fast, clear, loud, and question-pitch talking styles). The average error rate under stress and normal conditions fell by more than a factor of two with multi-style training and the average error rate under conditions sampled during training fell by a factor of four.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127689869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}