Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169574
J. Deller, T. Luk
The theory of set membership (SM) identification is formulated, and applied to linear prediction (LP) analysis of speech. The LP parameters of a simulated vowel are identified as an illustration. The SM strategy results in a significant computational savings due to rejection of data which are informationless in the SM sense.
{"title":"Set-membership theory applied to linear prediction analysis of speech","authors":"J. Deller, T. Luk","doi":"10.1109/ICASSP.1987.1169574","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169574","url":null,"abstract":"The theory of set membership (SM) identification is formulated, and applied to linear prediction (LP) analysis of speech. The LP parameters of a simulated vowel are identified as an illustration. The SM strategy results in a significant computational savings due to rejection of data which are informationless in the SM sense.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169573
T. Quatieri, R. McAulay
This paper describes a new method of deconvolving the vocal cord excitation and vocal tract system response. The technique relies on a sine-wave representation of the speech waveform and forms the basis of an analysis-synthesis method which yields synthetic speech essentially indistinguishable from the original. Unlike an earlier sinusoidal analysis-synthesis technique that used a minimum-phase system estimate, the approach in this paper generates a "mixed-phase" system estimate and thus an improved decomposition of excitation and system components. Since a mixed-phase system estimate is removed from the speech waveform, the resulting excitation residual is less dispersed than the previous sinusoidal-based excitation estimate or the more commonly used linear prediction residual. A method of time-varying linear filtering is given as an alternative to sinusoidal reconstruction, similar to conventional time-domain synthesis used in certain vocoders, but without the requirement of pitch and voicing decisions. Finally, speech modification with a mixed-phase system estimate is shown to be capable of more closely preserving waveform shape in time-scale and pitch transformations than the earlier approach.
{"title":"Mixed-phase deconvolution of speech based on a sine-wave model","authors":"T. Quatieri, R. McAulay","doi":"10.1109/ICASSP.1987.1169573","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169573","url":null,"abstract":"This paper describes a new method of deconvolving the vocal cord excitation and vocal tract system response. The technique relies on a sine-wave representation of the speech waveform and forms the basis of an analysis-synthesis method which yields synthetic speech essentially indistinguishable from the original. Unlike an earlier sinusoidal analysis-synthesis technique that used a minimum-phase system estimate, the approach in this paper generates a \"mixed-phase\" system estimate and thus an improved decomposition of excitation and system components. Since a mixed-phase system estimate is removed from the speech waveform, the resulting excitation residual is less dispersed than the previous sinusoidal-based excitation estimate or the more commonly used linear prediction residual. A method of time-varying linear filtering is given as an alternative to sinusoidal reconstruction, similar to conventional time-domain synthesis used in certain vocoders, but without the requirement of pitch and voicing decisions. Finally, speech modification with a mixed-phase system estimate is shown to be capable of more closely preserving waveform shape in time-scale and pitch transformations than the earlier approach.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123652862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169803
H. Hermansky
An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.
{"title":"An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception","authors":"H. Hermansky","doi":"10.1109/ICASSP.1987.1169803","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169803","url":null,"abstract":"An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115225732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169420
Ronald T. Williams, S. Prasad, A. Mahalanabis, L. Sibul
Through both theoretical developement and simulation the method we have put forward has been shown to yield improved performance over the conventional spatial smoothing algorithm. Our discussion has demonstrated that for a given array length the proposed method can be used to increase the number of coherent signals that can be resolved and thus effectively increase array aperture. Our simulation results underscore this fact. As we will see this increase in aperture is obtained, to some extent, at the expense of robustness.
{"title":"Localization of coherent sources using a modified spatial smoothing technique","authors":"Ronald T. Williams, S. Prasad, A. Mahalanabis, L. Sibul","doi":"10.1109/ICASSP.1987.1169420","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169420","url":null,"abstract":"Through both theoretical developement and simulation the method we have put forward has been shown to yield improved performance over the conventional spatial smoothing algorithm. Our discussion has demonstrated that for a given array length the proposed method can be used to increase the number of coherent signals that can be resolved and thus effectively increase array aperture. Our simulation results underscore this fact. As we will see this increase in aperture is obtained, to some extent, at the expense of robustness.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129218078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169367
L. Griffiths
When an adaptive array operates in the presence of white noise only, the resulting beam pattern is referred to as the quiescent response. Typically, these patterns have mainlobe and sidelobe shapes differing from those designed for use in deterministic, non-adaptive arrays. This paper describes a simple method which allows nearly arbitrary specification of the quiescent response in a linearly-constrained power minimization adaptive array. The only restriction on the quiescent is that it must meet the constraints defined for the adaptive array. Since many well-known deterministic designs such as Chebychev are not likely to meet the linear constraint conditions used in adaptive arrays for mainlobe and other pattern control functions, a procedure is presented which modifies the deterministic design to force it to meet the linear constraints in a least-squares manner. Once this has been accomplished, the methods outlined in this paper can be used to cause the modified deterministic design to become the quiescent response of the adaptive array. As a result, the adaptive array can be configured to closely resemble a deterministic array when the noise is white. Under conditions of correlated interference, or jamming, however, the response changes so as to effectively steer nulls in the appropriate directions. The method is based on the use of a generalized sidelobe canceller and requires one additional linear constraint for both narrow-band and broad-band arrays. This added flexibility in a partially adaptive array allows the system to be configured so as to meet an arbitrary number M of linear constraints either at all times (using M degrees of freedom) or only under quiescent conditions (using a single constraint). Any intermediate mixture of these extreme positions is also possible.
{"title":"A new approach to partially adaptive arrays","authors":"L. Griffiths","doi":"10.1109/ICASSP.1987.1169367","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169367","url":null,"abstract":"When an adaptive array operates in the presence of white noise only, the resulting beam pattern is referred to as the quiescent response. Typically, these patterns have mainlobe and sidelobe shapes differing from those designed for use in deterministic, non-adaptive arrays. This paper describes a simple method which allows nearly arbitrary specification of the quiescent response in a linearly-constrained power minimization adaptive array. The only restriction on the quiescent is that it must meet the constraints defined for the adaptive array. Since many well-known deterministic designs such as Chebychev are not likely to meet the linear constraint conditions used in adaptive arrays for mainlobe and other pattern control functions, a procedure is presented which modifies the deterministic design to force it to meet the linear constraints in a least-squares manner. Once this has been accomplished, the methods outlined in this paper can be used to cause the modified deterministic design to become the quiescent response of the adaptive array. As a result, the adaptive array can be configured to closely resemble a deterministic array when the noise is white. Under conditions of correlated interference, or jamming, however, the response changes so as to effectively steer nulls in the appropriate directions. The method is based on the use of a generalized sidelobe canceller and requires one additional linear constraint for both narrow-band and broad-band arrays. This added flexibility in a partially adaptive array allows the system to be configured so as to meet an arbitrary number M of linear constraints either at all times (using M degrees of freedom) or only under quiescent conditions (using a single constraint). Any intermediate mixture of these extreme positions is also possible.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125861956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169363
Salim Roukos, A. Wilgus, W. Russell
In previous papers, we have described the segment vocoder, which transmits intelligible speech at 300 b/s in speaker-independent mode, i.e., new users need not train the system. As expected for vector quantizers, the storage and computational requirements of the segment vocoder are significantly larger than those of the standard LPC-10 vocoder. In this paper, we describe methods for reducing computational and storage requirements of the segment vocoder and present an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips. The DRT score of the simplified algorithm is 78%.
{"title":"A segment vocoder algorithm for real-time implementation","authors":"Salim Roukos, A. Wilgus, W. Russell","doi":"10.1109/ICASSP.1987.1169363","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169363","url":null,"abstract":"In previous papers, we have described the segment vocoder, which transmits intelligible speech at 300 b/s in speaker-independent mode, i.e., new users need not train the system. As expected for vector quantizers, the storage and computational requirements of the segment vocoder are significantly larger than those of the standard LPC-10 vocoder. In this paper, we describe methods for reducing computational and storage requirements of the segment vocoder and present an algorithm that is implementable in real-time on hardware containing several Digital Signal Processing chips. The DRT score of the simplified algorithm is 78%.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131154170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169727
Y. Ephraim, A. Dembo, L. Rabiner
A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed. This approach does not require the commonly used assumption that the source to be modeled is a hidden Markov process. The algorithm is started from the model estimated by the traditional maximum likelihood (ML) approach and alternatively decreases the discrimination information over all probability distributions of the source which agree with the given measurements and all hidden Markov models. The proposed procedure generalizes the Baum algorithm for ML hidden Markov modeling. The procedure is shown to be a descent algorithm for the discrimination information measure and its local convergence is proved.
{"title":"A minimum discrimination information approach for hidden Markov modeling","authors":"Y. Ephraim, A. Dembo, L. Rabiner","doi":"10.1109/ICASSP.1987.1169727","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169727","url":null,"abstract":"A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed. This approach does not require the commonly used assumption that the source to be modeled is a hidden Markov process. The algorithm is started from the model estimated by the traditional maximum likelihood (ML) approach and alternatively decreases the discrimination information over all probability distributions of the source which agree with the given measurements and all hidden Markov models. The proposed procedure generalizes the Baum algorithm for ML hidden Markov modeling. The procedure is shown to be a descent algorithm for the discrimination information measure and its local convergence is proved.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133761540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169322
R. Roy, A. Paulraj, T. Kailath
ESPRIT is a new algorithm for signal parameter estimation with applications to direction-of-arrival estimation in a multiple source environment. It has considerable computational advantages (e.g., faster and applies to sensor arrays with unknown and nearly arbitrary geometry requiring no array calibration and storage) over the well-known conventional MUSIC algorithm. Herein, results of computer simulations carried out to compare their resolution and error (bias and variance) performance are presented. A new multi-dimensional spectral measure for the MUSIC algorithm is also introduced and preliminary investigations of its performance are presented.
{"title":"Comparative performance of ESPRIT and MUSIC for direction-of-arrival estimation","authors":"R. Roy, A. Paulraj, T. Kailath","doi":"10.1109/ICASSP.1987.1169322","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169322","url":null,"abstract":"ESPRIT is a new algorithm for signal parameter estimation with applications to direction-of-arrival estimation in a multiple source environment. It has considerable computational advantages (e.g., faster and applies to sensor arrays with unknown and nearly arbitrary geometry requiring no array calibration and storage) over the well-known conventional MUSIC algorithm. Herein, results of computer simulations carried out to compare their resolution and error (bias and variance) performance are presented. A new multi-dimensional spectral measure for the MUSIC algorithm is also introduced and preliminary investigations of its performance are presented.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132784627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169733
M. Pallas, N. Martin, J. Martin
From an active underwater acoustics experiment, we intend to estimate the time delays of the multipath propagation, in the case where the time differences of arrival are not large enough to be treated by classical methods. After estimating the propagation filter transfer function, we apply the autoregressive modelization to these frequential data. We deduce the delays values from the poles locations of the AR model. Simulations of the processing are presented. Finally, the method is applied to real data.
{"title":"Time delay estimation by autoregressive modelization","authors":"M. Pallas, N. Martin, J. Martin","doi":"10.1109/ICASSP.1987.1169733","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169733","url":null,"abstract":"From an active underwater acoustics experiment, we intend to estimate the time delays of the multipath propagation, in the case where the time differences of arrival are not large enough to be treated by classical methods. After estimating the propagation filter transfer function, we apply the autoregressive modelization to these frequential data. We deduce the delays values from the poles locations of the AR model. Simulations of the processing are presented. Finally, the method is applied to real data.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115465097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169746
H. Ney
This paper deels with the use of context-free grammars in automatic speech recognition. A dynamic programming algorithm for recognizing and parsing spoken word strings of a context-free grammar is presented. The algorithm can be viewed as a probabilistic extension of the CYK algorithm along with the incorporation of the nonlineer time alignment. Details of the implementation and experimental tests are described.
{"title":"Dynamic programming speech recognition using a context-free grammar","authors":"H. Ney","doi":"10.1109/ICASSP.1987.1169746","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169746","url":null,"abstract":"This paper deels with the use of context-free grammars in automatic speech recognition. A dynamic programming algorithm for recognizing and parsing spoken word strings of a context-free grammar is presented. The algorithm can be viewed as a probabilistic extension of the CYK algorithm along with the incorporation of the nonlineer time alignment. Details of the implementation and experimental tests are described.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115663621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}