Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169582
J. Fuchs
For a random process that can be modeled as a sum of real sinusoids in white noise, we address the problem of the estimation of the number of sinusoids. The test we propose uses the eigen-decomposition of the estimated autocorrelation matrix and is based on matrix perturbation analysis. The estimator is shown to resolve closely spaced sinusoids at quite low signal -to- noise ratios.
{"title":"Estimating the number of sinusoids in additive white-noise","authors":"J. Fuchs","doi":"10.1109/ICASSP.1987.1169582","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169582","url":null,"abstract":"For a random process that can be modeled as a sum of real sinusoids in white noise, we address the problem of the estimation of the number of sinusoids. The test we propose uses the eigen-decomposition of the estimated autocorrelation matrix and is based on matrix perturbation analysis. The estimator is shown to resolve closely spaced sinusoids at quite low signal -to- noise ratios.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122362172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169850
S. Prasad, Ronald T. Williams, Arijit K. Mahalanabis, L. Sibul
In recent years a new, and very powerful technique for parameter estimation - the eigenstructure, or signal subspace method - has been developed. Eigenstructure algorithms are closely related to Pisarenko's method for estimating the frequencies of sinusoids in white Gaussian noise. In theory they yield asymptotically unbiased estimates of arbitrarily close parameters, independent of the signal-to-noise ratio (SNR). Although signal subspace methods have proven to be powerful tools, they are not without drawbacks. An important weakness of all signal subspace algorithmis their need to know the noise covariance explicitly. The important problem of developing signal subspace based procedures for signals in noise fields with unknown covariance has not been satisfactorily addressed. It is our intent to propose a solution to the problem of direction-of-arrival (DOA) estimation for a broad class of unknown noise fields. We will then briefly discuss other important estimation problems for which modified versions of this procedure can be applied.
{"title":"A transform based covariance differencing approach to bearing estimation","authors":"S. Prasad, Ronald T. Williams, Arijit K. Mahalanabis, L. Sibul","doi":"10.1109/ICASSP.1987.1169850","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169850","url":null,"abstract":"In recent years a new, and very powerful technique for parameter estimation - the eigenstructure, or signal subspace method - has been developed. Eigenstructure algorithms are closely related to Pisarenko's method for estimating the frequencies of sinusoids in white Gaussian noise. In theory they yield asymptotically unbiased estimates of arbitrarily close parameters, independent of the signal-to-noise ratio (SNR). Although signal subspace methods have proven to be powerful tools, they are not without drawbacks. An important weakness of all signal subspace algorithmis their need to know the noise covariance explicitly. The important problem of developing signal subspace based procedures for signals in noise fields with unknown covariance has not been satisfactorily addressed. It is our intent to propose a solution to the problem of direction-of-arrival (DOA) estimation for a broad class of unknown noise fields. We will then briefly discuss other important estimation problems for which modified versions of this procedure can be applied.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169359
Y. Matsuyama
Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.
{"title":"Variable region vector quantization, space warping and speech/Image compression","authors":"Y. Matsuyama","doi":"10.1109/ICASSP.1987.1169359","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169359","url":null,"abstract":"Algorithms for vector quantization of variable region data are given. The design iteration is proved to converge. An important issue here is the optimization step of the region shape with respect to the vector quantization codebook. Thus, the presented design method is a nontrivial extention of ordinary vector quantizer design which contains the classical Lloyd-Max algorithm. First, the main algorithm is given without introducing any physical entity. Therefore, the method is applicable to any data including speech and image as long as the quantization distortion is defined. In the speech coding case, which is the main body of this paper, the region shape optimization is interpreted as the epoch interval adjustment. The selection of the adjusted epochs with respect to the vector quantization codebook considerably reduces the quantizing distortion. This enables very-low-rate speech compression. Then, the image coding case is formulated and some convergence problem is discussed.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126691937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169731
B. Mérialdo
This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps:bulleta Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.bulletfrom this list, a Word Match procedure uses the dictionary to build partial word hypothesis.bulletthen a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested,bulletone composed of the 10,000 most frequent words,bulletthe other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.
{"title":"Speech recognition with very large size dictionary","authors":"B. Mérialdo","doi":"10.1109/ICASSP.1987.1169731","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169731","url":null,"abstract":"This paper proposes a new strategy, the Multi-Level Decoding (MLD), that allows to use a Very Large Size Dictionary (VLSD, size more than 100,000 words) in speech recognition. MLD proceeds in three steps:bulleta Syllable Match procedure uses an acoustic model to build a list of the most probable syllables that match the acoustic signal from a given time frame.bulletfrom this list, a Word Match procedure uses the dictionary to build partial word hypothesis.bulletthen a Sentence Match procedure uses a probabilistic language model to build partial sentence hypothesis until total sentences are found. An original matching algorithm is proposed for the Syllable Match procedure. This strategy is experimented on a dictation task of French texts. Two different dictionaries are tested,bulletone composed of the 10,000 most frequent words,bulletthe other composed of 200,000 words. The recognition results are given and compared. The error rate on words with 10,000 words is 17.3%. If the errors due to the lack of coverage are not counted, the error rate with 10,000 words is reduced to 10.6%. The error rate with 200,000 words is 12.7%.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169360
Y. Shoham
Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.
{"title":"Vector predictive quantization of the spectral parameters for low rate speech coding","authors":"Y. Shoham","doi":"10.1109/ICASSP.1987.1169360","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169360","url":null,"abstract":"Vector Predictive Quantization (VPQ) is proposed for coding the short-term spectral envelope of speech. The proposed VPQ scheme predicts the current spectral envelope from several past spectra, using a predictor codebook. The residual spectrum is coded by a residual codebook. The system operates in the log-spectral domain using a sampled version of the spectral envelope. Experimental results indicate a prediction gain in the range of 9 to 13 dB and an average log-spectral distance of 1.3 to 1.7 dB. Informal listening tests suggest that replacing the conventional scalar quantizer in a 4.8 Kbits/s CELP coder by a VPQ system allows a reduction of the rate assigned to the LPC data from 1.8 Kbits/s to 1.0 Kbits/s without any obvious difference in the perceptual quality.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131170554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169899
F. Soong, M. Sondhi
The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.
{"title":"A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise","authors":"F. Soong, M. Sondhi","doi":"10.1109/ICASSP.1987.1169899","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169899","url":null,"abstract":"The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a \"bandwidth broadened\" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116527103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169666
A. Kundu
In this paper, a new algorithm for visual shape coding and representation using a powerful theorem on algebraic curve has been presented. The algorithm codes the closed shape, which does never intersect itself, by means of the foci of the closed curve and a distance parameter 'p' such that the product of distances of any point on the shape from the foci is approximately constant and is equal to 'p'. The computation of the foci co-ordinates and 'p' parameter has been presented as a solution to a linearized least-square problem. The reconstruction algorithm is based on straightforward implementation of the theorem. Some experimental results have also been provided indicating the success of the algorithm.
{"title":"Lemniscate transform: A new efficient technique for shape coding and representation","authors":"A. Kundu","doi":"10.1109/ICASSP.1987.1169666","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169666","url":null,"abstract":"In this paper, a new algorithm for visual shape coding and representation using a powerful theorem on algebraic curve has been presented. The algorithm codes the closed shape, which does never intersect itself, by means of the foci of the closed curve and a distance parameter 'p' such that the product of distances of any point on the shape from the foci is approximately constant and is equal to 'p'. The computation of the foci co-ordinates and 'p' parameter has been presented as a solution to a linearized least-square problem. The reconstruction algorithm is based on straightforward implementation of the theorem. Some experimental results have also been provided indicating the success of the algorithm.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115805081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169405
J. Princen, A. Johnson, A. B. Bradley
A new, oddly stacked, critically sampled, single side-band (SSB) [7] analysis/synthesis system based on Time Domain Aliasing Cancellation (TDAC) [1],[2] is described in this paper. The specifications for the analysis and synthesis filter responses are developed and a number of designs which satisfy the reconstruction requirements are described. The application of TDAC systems to Subband/Transform coding is also discussed and the objective performance of a 32 band coder using several different window designs is presented and compared with a coder based on Frequency Domain Aliasing Cancellation (FDAC) filter banks [3]-[5].
{"title":"Subband/Transform coding using filter bank designs based on time domain aliasing cancellation","authors":"J. Princen, A. Johnson, A. B. Bradley","doi":"10.1109/ICASSP.1987.1169405","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169405","url":null,"abstract":"A new, oddly stacked, critically sampled, single side-band (SSB) [7] analysis/synthesis system based on Time Domain Aliasing Cancellation (TDAC) [1],[2] is described in this paper. The specifications for the analysis and synthesis filter responses are developed and a number of designs which satisfy the reconstruction requirements are described. The application of TDAC systems to Subband/Transform coding is also discussed and the objective performance of a 32 band coder using several different window designs is presented and compared with a coder based on Frequency Domain Aliasing Cancellation (FDAC) filter banks [3]-[5].","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133087462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169635
M. Beddoes, L. Panych, Juan Qian, J. Wada
The role of the parametric stage is studied under various conditions and the following points have been demonstrated: -With high sampling rate (200 Hz) but otherwise favourable conditions, as the filter order, p, is increased from zero to nineteen the signal-to-noise ratio at the output of the parametric stage remains the same as at the input. Under less favourable conditions, it can fall as p is increased. -We find that comparable performance (in terms of spikes detected) is obtained when the parametric stage is omitted entirely, and detection is based only on the very simple non-parametric stage.
{"title":"A criticism of the parametric EEG spike detector","authors":"M. Beddoes, L. Panych, Juan Qian, J. Wada","doi":"10.1109/ICASSP.1987.1169635","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169635","url":null,"abstract":"The role of the parametric stage is studied under various conditions and the following points have been demonstrated: -With high sampling rate (200 Hz) but otherwise favourable conditions, as the filter order, p, is increased from zero to nineteen the signal-to-noise ratio at the output of the parametric stage remains the same as at the input. Under less favourable conditions, it can fall as p is increased. -We find that comparable performance (in terms of spikes detected) is obtained when the parametric stage is omitted entirely, and detection is based only on the very simple non-parametric stage.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169338
A. Ciaramella, G. Venuti
Here we describe the firmware implementation of an acoustical front-end, performing the vector quantization of Discrete Cosine Transform (DCT) for a speech recognition system. This firmware runs on a single TMS32020 signal processor chip and is characterized both by a substantial real time performance and by a good accuracy.
{"title":"Vector quantization firmware for an acoustical front-end using the TMS32020","authors":"A. Ciaramella, G. Venuti","doi":"10.1109/ICASSP.1987.1169338","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169338","url":null,"abstract":"Here we describe the firmware implementation of an acoustical front-end, performing the vector quantization of Discrete Cosine Transform (DCT) for a speech recognition system. This firmware runs on a single TMS32020 signal processor chip and is characterized both by a substantial real time performance and by a good accuracy.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116049296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}