Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389831
A. Shaw, W. Xia
The minimum-norm method (MNM) for high-resolution directions-of-arrival (DOA) estimation relies on special purpose hardware or software for obtaining the signal and noise subspace eigenvectors of autocorrelation (AC) matrices. It is shown in this paper that the DFT of the AC matrix (DFT-of-AC) essentially performs an equivalent task of separating the signal and noise subspaces. Furthermore, when the signal-subspace part of the DFT-of-AC vectors are used in the minimum-norm framework, almost identical high-resolution DOA estimates are produced. When compared with eigendecomposition-based MNM, the computational load of the proposed DFT-based approach (D-MNM) is lower but the bias, mean-squared error and the root locations are almost similar. The simulations further show that at low SNR the performance of D-MNM is more robust and it also has superior dynamic range.<>
{"title":"High-resolution direction of arrival estimation using minimum-norm method without eigendecomposition","authors":"A. Shaw, W. Xia","doi":"10.1109/ICASSP.1994.389831","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389831","url":null,"abstract":"The minimum-norm method (MNM) for high-resolution directions-of-arrival (DOA) estimation relies on special purpose hardware or software for obtaining the signal and noise subspace eigenvectors of autocorrelation (AC) matrices. It is shown in this paper that the DFT of the AC matrix (DFT-of-AC) essentially performs an equivalent task of separating the signal and noise subspaces. Furthermore, when the signal-subspace part of the DFT-of-AC vectors are used in the minimum-norm framework, almost identical high-resolution DOA estimates are produced. When compared with eigendecomposition-based MNM, the computational load of the proposed DFT-based approach (D-MNM) is lower but the bias, mean-squared error and the root locations are almost similar. The simulations further show that at low SNR the performance of D-MNM is more robust and it also has superior dynamic range.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389230
H. Kawai, N. Higuchi, Tohru Shimizu, Seiichi Yamamoto
A text-to-speech system for Japanese was developed based on waveform splicing. A stored unit is a sequence of phonemes segmented at vowel-consonant boundaries. Four and eight phoneme groups are distinguished for the preceding and succeeding phonemic environment, respectively. An inventory of waveform segments including frequently used 1020 units was constructed based on a statistical analysis of a text database consisting of 20 million phonemes. Each stored unit has, on average, 2.5 waveform segments with different fundamental frequency (F/sub 0/) and phoneme duration. The F/sub 0/ and phoneme duration are modified by a pitch synchronous overlap add (PSOLA) method. A time window which has a flat portion at its center (Tukey window) was adopted in place of an ordinary Hanning window. A preference test indicated that the Tukey window gives better quality when the F/sub 0/ is lowered. The articulation score of an intelligibility test was 89.2%.<>
{"title":"Development of a text-to-speech system for Japanese based on waveform splicing","authors":"H. Kawai, N. Higuchi, Tohru Shimizu, Seiichi Yamamoto","doi":"10.1109/ICASSP.1994.389230","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389230","url":null,"abstract":"A text-to-speech system for Japanese was developed based on waveform splicing. A stored unit is a sequence of phonemes segmented at vowel-consonant boundaries. Four and eight phoneme groups are distinguished for the preceding and succeeding phonemic environment, respectively. An inventory of waveform segments including frequently used 1020 units was constructed based on a statistical analysis of a text database consisting of 20 million phonemes. Each stored unit has, on average, 2.5 waveform segments with different fundamental frequency (F/sub 0/) and phoneme duration. The F/sub 0/ and phoneme duration are modified by a pitch synchronous overlap add (PSOLA) method. A time window which has a flat portion at its center (Tukey window) was adopted in place of an ordinary Hanning window. A preference test indicated that the Tukey window gives better quality when the F/sub 0/ is lowered. The articulation score of an intelligibility test was 89.2%.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116964538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389241
W. Chan, David Chemla
Tree structured vector quantization (TSVQ) is employed as a low-complexity approach to performing vector quantization of speech linear prediction coefficients, expressed for the purpose of quantization as line spectral frequency (LSF) parameters. Good tradeoffs between search complexity and distortion-rate performance are obtained using multiple-survivor encoding. The exponential storage-complexity of conventional TSVQ is circumvented by using multiple stages, where one or more tree codebooks may be used in each stage. Experimental results show that for rates between 23-25 bits/frame,the encoding complexity required to achieve "transparent coding" quality ranges from below two hundred to several hundred weighted-squared-error distortion computations per frame.<>
{"title":"Low-complexity encoding of speech LSF parameters using constrained-storage TSVQ","authors":"W. Chan, David Chemla","doi":"10.1109/ICASSP.1994.389241","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389241","url":null,"abstract":"Tree structured vector quantization (TSVQ) is employed as a low-complexity approach to performing vector quantization of speech linear prediction coefficients, expressed for the purpose of quantization as line spectral frequency (LSF) parameters. Good tradeoffs between search complexity and distortion-rate performance are obtained using multiple-survivor encoding. The exponential storage-complexity of conventional TSVQ is circumvented by using multiple stages, where one or more tree codebooks may be used in each stage. Experimental results show that for rates between 23-25 bits/frame,the encoding complexity required to achieve \"transparent coding\" quality ranges from below two hundred to several hundred weighted-squared-error distortion computations per frame.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116363770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389665
Brigitte Colnet, J. Haton
Source localisation is among the most important steps in array processing. The authors present a neuromimetic approach in signal processing. A set of neural networks is used to find the azimuth of one or several sources impinging on a linear array of equally spaced sensors. Each network in this set is specialised to determine if there is an emitter in a given angular sector. Thus a neural network has a specific architecture suited to detect and enhance the signal coming from the angular sector it is associated with. The performances of this method on real underwater signals confirm the encouraging results obtained on simulation tests.<>
{"title":"Far field array processing with neural networks","authors":"Brigitte Colnet, J. Haton","doi":"10.1109/ICASSP.1994.389665","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389665","url":null,"abstract":"Source localisation is among the most important steps in array processing. The authors present a neuromimetic approach in signal processing. A set of neural networks is used to find the azimuth of one or several sources impinging on a linear array of equally spaced sensors. Each network in this set is specialised to determine if there is an emitter in a given angular sector. Thus a neural network has a specific architecture suited to detect and enhance the signal coming from the angular sector it is associated with. The performances of this method on real underwater signals confirm the encouraging results obtained on simulation tests.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121887656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389275
J. McDonough, Kenney Ng, P. Jeanrenaud, H. Gish, J. R. Rohlicek
Topic identification (TID) is the automatic classification of speech messages into one of a known set of possible topics. The TID task can be view as having three principal components: 1) event generation, 2) keyword event selection, and 3) topic modeling. Using data from the Switchboard corpus, the authors present experimental results for various approaches to the TID problem and compare the relative effectiveness of each. In addition, they examine the effect of keyword set size on identification accuracy and gauge the loss in performance when mismatched topic modeling and keyword selection schemes are used.<>
{"title":"Approaches to topic identification on the switchboard corpus","authors":"J. McDonough, Kenney Ng, P. Jeanrenaud, H. Gish, J. R. Rohlicek","doi":"10.1109/ICASSP.1994.389275","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389275","url":null,"abstract":"Topic identification (TID) is the automatic classification of speech messages into one of a known set of possible topics. The TID task can be view as having three principal components: 1) event generation, 2) keyword event selection, and 3) topic modeling. Using data from the Switchboard corpus, the authors present experimental results for various approaches to the TID problem and compare the relative effectiveness of each. In addition, they examine the effect of keyword set size on identification accuracy and gauge the loss in performance when mismatched topic modeling and keyword selection schemes are used.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121921639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389408
Ashok Popat, Rosalind W. Picard
The performance of a statistical signal processing system is determined in large part by the accuracy of the probabilistic model it employs. Accurate modeling often requires working in several dimensions, but doing so can introduce dimensionality-related difficulties. A previously introduced model circumvents some of these difficulties while maintaining accuracy sufficient to account for much of the high-order, nonlinear statistical interdependence of samples. Properties of this model are reviewed, and its power demonstrated by application to image restoration and compression. Also described is a vector quantization (VQ) scheme which employs the model in entropy coding a Z/sup N/-lattice. The scheme has the advantage over standard VQ of bounding maximum instantaneous errors.<>
{"title":"Cluster-based probability model applied to image restoration and compression","authors":"Ashok Popat, Rosalind W. Picard","doi":"10.1109/ICASSP.1994.389408","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389408","url":null,"abstract":"The performance of a statistical signal processing system is determined in large part by the accuracy of the probabilistic model it employs. Accurate modeling often requires working in several dimensions, but doing so can introduce dimensionality-related difficulties. A previously introduced model circumvents some of these difficulties while maintaining accuracy sufficient to account for much of the high-order, nonlinear statistical interdependence of samples. Properties of this model are reviewed, and its power demonstrated by application to image restoration and compression. Also described is a vector quantization (VQ) scheme which employs the model in entropy coding a Z/sup N/-lattice. The scheme has the advantage over standard VQ of bounding maximum instantaneous errors.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122129763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389911
D. Thomson
Presents examples, history, and a brief review of the theory of multiple-window and quadratic-inverse spectrum estimation methods for mixed harmonizable processes. In addition to the standard uses of making consistent non-parametric auto- and cross-spectrum estimates with jackknife confidence intervals and estimating periodic components in coloured noise, quadratic-inverse theory gives a time-frequency decomposition for stochastic processes. This leads to new estimates of both common and less-familiar functions such as the "time-derivative" of a spectrum.<>
{"title":"An overview of multiple-window and quadratic-inverse spectrum estimation methods","authors":"D. Thomson","doi":"10.1109/ICASSP.1994.389911","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389911","url":null,"abstract":"Presents examples, history, and a brief review of the theory of multiple-window and quadratic-inverse spectrum estimation methods for mixed harmonizable processes. In addition to the standard uses of making consistent non-parametric auto- and cross-spectrum estimates with jackknife confidence intervals and estimating periodic components in coloured noise, quadratic-inverse theory gives a time-frequency decomposition for stochastic processes. This leads to new estimates of both common and less-familiar functions such as the \"time-derivative\" of a spectrum.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123997329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389774
A. Kavcic, Bin Yang
A new algorithm for signal subspace tracking is presented. It is based on an approximated singular value decomposition using interlaced QR-updating and Jacobi plane rotations. By forcing the noise subspace to be spherical, the computational complexity of the algorithm is brought down to O(nr), where n is the problem dimension and r is the desired number of signal components. The algorithm lends itself for a very efficient systolic array implementation, resulting in a throughput of O(n/sup 0/). Simulations show that the frequency tracking capabilities of the new method are at least as good as those of the computationally much more expensive exact singular value decomposition.<>
{"title":"A new efficient subspace tracking algorithm based on singular value decomposition","authors":"A. Kavcic, Bin Yang","doi":"10.1109/ICASSP.1994.389774","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389774","url":null,"abstract":"A new algorithm for signal subspace tracking is presented. It is based on an approximated singular value decomposition using interlaced QR-updating and Jacobi plane rotations. By forcing the noise subspace to be spherical, the computational complexity of the algorithm is brought down to O(nr), where n is the problem dimension and r is the desired number of signal components. The algorithm lends itself for a very efficient systolic array implementation, resulting in a throughput of O(n/sup 0/). Simulations show that the frequency tracking capabilities of the new method are at least as good as those of the computationally much more expensive exact singular value decomposition.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125812264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389995
J. Gotze
Since the stage of diagonalization of Jacobi-type methods is difficult to monitor in a parallel environment, it is usually proposed to execute a predetermined number of sweeps (iterations) on a parallel processor array. A possibility for monitoring the stage of diagonalization is essential in order to avoid the execution of a significant number of unnecessary sweeps. Based on a Lemma used for a generalized proof of the quadratic convergence of the Jacobi EVD and SVD methods a new criteria for monitoring the stage of diagonalization is derived. Using this criteria it can easily be monitored when the stage of quadratic convergence is reached (only one bit yields this information). Therefore, only the (small) number of quadratically convergent sweeps must be predetermined. A further similar criteria particularly useful for Jacobi-type methods using CORDIC-based approximate rotations is also given.<>
{"title":"Monitoring the stage of diagonalization in Jacobi-type methods","authors":"J. Gotze","doi":"10.1109/ICASSP.1994.389995","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389995","url":null,"abstract":"Since the stage of diagonalization of Jacobi-type methods is difficult to monitor in a parallel environment, it is usually proposed to execute a predetermined number of sweeps (iterations) on a parallel processor array. A possibility for monitoring the stage of diagonalization is essential in order to avoid the execution of a significant number of unnecessary sweeps. Based on a Lemma used for a generalized proof of the quadratic convergence of the Jacobi EVD and SVD methods a new criteria for monitoring the stage of diagonalization is derived. Using this criteria it can easily be monitored when the stage of quadratic convergence is reached (only one bit yields this information). Therefore, only the (small) number of quadratically convergent sweeps must be predetermined. A further similar criteria particularly useful for Jacobi-type methods using CORDIC-based approximate rotations is also given.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125819169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389317
J. D. Verdejo, J. C. Segura, P. García-Teodoro, A. Rubio
This paper presents a new framework developed to apply Alphanets to CSR. For this purpose, a modular system is proposed. This system is made up by three different modules: LVQ module, SLHMM module and DP module. The SLHMM module is an expansion of an Alphanet, and therefore, can be interpreted as a HMM. The system can be trained globally applying backpropagation techniques. The used pruning procedure is based upon recognized units instead of observations, which reduces the number of nodes needed to recognize a sentence, compared to HMM-based systems using the same parameters for the models in both systems. Besides, the training procedure re-adapts the weights according to the new architecture in a few iterations since the initial parameters can be estimated from a classical HMM CSR system.<>
{"title":"SLHMM: a continuous speech recognition system based on Alphanet-HMM","authors":"J. D. Verdejo, J. C. Segura, P. García-Teodoro, A. Rubio","doi":"10.1109/ICASSP.1994.389317","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389317","url":null,"abstract":"This paper presents a new framework developed to apply Alphanets to CSR. For this purpose, a modular system is proposed. This system is made up by three different modules: LVQ module, SLHMM module and DP module. The SLHMM module is an expansion of an Alphanet, and therefore, can be interpreted as a HMM. The system can be trained globally applying backpropagation techniques. The used pruning procedure is based upon recognized units instead of observations, which reduces the number of nodes needed to recognize a sentence, compared to HMM-based systems using the same parameters for the models in both systems. Besides, the training procedure re-adapts the weights according to the new architecture in a few iterations since the initial parameters can be estimated from a classical HMM CSR system.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125873374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}